Sign up FAST! Login

Mozilla Firefox Test Pilot plug in offered a data set of browser habits in 2010.


Stashed in: Web Browsers, Firefox!, Quora!, Machine Learning, Crowdsourcing, Big Data!

To save this post, select a stash from drop-down menu or type in a new one:

From Mozilla Labs in 2010:

Considering the virtual reams of data we generate for companies like Facebook every day, they give us awful little in return. While they sell the information to third parties or use it to display targeted advertisements, we're left with a largely anecdotal understanding of Internet habits. We can install programs to track our personal Internet usage, but it's difficult to place these individual habits in a broader context.

Since mid-2009, the folks behind Firefox have encouraged its users to install Test Pilot, a plug-in that collects anonymized browser usage. Test Pilot tracks many different "events" like booting up or shutting down the browser, adding a bookmark, and turning Firefox's private-browsing feature on and off. (Unsurprisingly, the private-browsing data have received the bulk of the attention.) Every few months, Mozilla Labs, the group in charge of Test Pilot, releases another set of data collected by the plug-in, often examining a specific aspect of browsing—say, what parts of the toolbar people click most. Last month, Mozilla Labs released its most comprehensive Test Pilot data set, the second version of what it's calling "A Week in the Life of a Browser."

The abundance of data in "Week in the Life," which covers a week's worth of 27,000 users' browsing activities, can be paralyzing. Faced with several gigabytes of decompressed data, where do you start? Tab usage, I decided.

In the past decade, the ability to open multiple—even dozens of—Web pages in a single window has shifted from a fringe, power-user feature to a mainstream offering. Today, every major browser supports tabbed browsing, even slow-to-evolve Internet Explorer. A browser with 27 open tabs has, fairly or not, come to symbolize the frenetic, attention-deficient aspects of our Web-centric lives. What can Firefox's data on this feature tell us about ourselves? You'll find some preliminary answers (and charts!) below. But first a few caveats about the data.

Companies like Nielsen pay people to let software spy on their Internet activity. These companies sell this information to advertising companies, academic researchers, and the like. It has led to handsome profits and fascinating insights. Researchers at the University of Chicago, for example, used Nielsen data to estimate the extent to which Web surfers self-segregate by ideology—a study Slate then used to create a "media isolation" profiler. But if you want access to most of Nielsen's data, you'll have to cough up some serious change. 

Mozilla Labs, on the other hand, gives Test Pilot data away for free—with tradeoffs. Data collection requires a balancing act between creepiness and the desire for useful detail. Because Mozilla wants to recruit as many participants as possible and isn't paying them, Test Pilot collects much less personal information than Nielsen does. There's no information on income, race, or location. In fact, only about 4,000 of the 27,000 users in the latest data dump answered such basic questions as: "What is your gender?," "How old are you?," and "How much time do you spend on the Web each day?" 

The demographic data Test Pilot users did provide, however, raise a giant red flag. Of course, Firefox users skew toward the type of people savvy enough to use a browser other than the one that comes pre-installed on their computer, typically Internet Explorer or Safari. And though Test Pilot recently passed the million-active-user mark, people who install it represent an even more specific class of technophilic users who are comfortable with plug-ins. (That's not to say bias isn't present in the Nielsen panel: It overrepresents people who don't mind a large corporation tracking their online habits.) What's remarkable, however, is how enormously Test Pilot underrepresents women—just 6.5 percent of the "Week in the Life" survey respondents said they were female. There are so few female-identifying participants—just 257—that we can have very little confidence whether the women in this study are representative of most other women.

...

Though Mozilla continues to release additional Test Pilot results, more robust analyses beg for data from the other major browsers. (Google and Microsoft have collected similar user information for Chrome and Internet Explorer, but they don't have immediate plans to release the raw data publicly, their press officers tell me.) We'll also need professional statisticians—which I am decidedly not—to crunch the numbers with greater sophistication. And to make the data relevant outside the boundaries of your computer screen, we'll need to link it to research in sociology, psychology, and other disciplines. The Danah Boyds of this world could have a field day.

This could have been so much better. 

Mozilla has since pulled those data sets.

Quora has a list of other good publicly available data sets but none include browser data:

https://quora.com/Where-can-I-find-large-datasets-open-to-the-public

Why you should use open data to hone your machine learning models, by Crowdflower:

https://www.crowdflower.com/why-you-should-use-open-data-to-hone-your-machine-learning-models/

You May Also Like: