How the Census Bureau Collects, Provides Data - what happens when census is augmented with external sources?
Mo Data stashed this in Big Data Ethics and Privacy
"At a very high level, our data products result primarily from the implementation of censuses, sample surveys, administrative records, and statistical modeling. We conduct a census every 10 years that primarily provides counts of people (including limited demographics) and counts of all habitable dwellings in the United States. Every five years, we also conduct a census of the economy, measuring counts of different types of businesses and some characteristics of their activities, as well as a census of governments.
While censuses are our primary source of counts and some characteristics, sample surveys are conducted more frequently (e.g., monthly, quarterly, and annually) of people, housing, businesses, and governments to provide current estimates of their characteristics. Measures produced by our demographic sample surveys (many operated in partnership with other federal agencies) include unemployment and other labor force characteristics, income, government program participation and eligibility data, purchases of specific goods and services, time use, education characteristics, illnesses, disability, health, types and incidence of crime, science/engineering work force, housing, poverty, and health insurance coverage. Measures produced by our economic sample surveys include dollar value of retail sales, inventory, wholesale trade activities, receipts and revenues for service industries, manufacturing, construction, finances, state and local tax revenues, state and local government employment, imports, exports, transportation, communications, and utilities."
Digital transaction data are ubiquitous and growing. Financial transactions previously done with paper checks or cash are now done with credit or debit cards, or increasingly with smart-phones. Online news and blogs replace newspapers, search engines record the trends of people’s interests, and social media provides trends on what people are discussing. Smart phone GPS data provide traffic congestion data for Google maps. E-commerce transactions provide signals as to what items cost and which demographics are buying them. Local governments are making data available via Internet APIs for public access. Internet use is growing; there are more devices connected to the Internet in the U.S. than people. Smart phone use continues to grow, and the trends are not expected to reverse.
Can we ignore this growing ocean of digital data? Avoiding a formal definition of Big Data, we present a few comparisons reflecting our impressions between official statistics that result from censuses, sample surveys, administrative records, and statistical modeling on the one hand and Big Data on the other
There are many examples of statistics users have requested at small geographic levels that might be improved with Big Data. These include requests for small-area housing and construction data, including housing permits, housing sales, foreclosures, housing values, property taxes, construction starts, and commercial construction values. There are also requests for business activity data, including retail sales, durable goods sales, data on business clusters and supply chains, interests in small business start-ups, local government sales, and shipping activity (e.g., barge, shipping, rail, trucking, FedEx, and UPS). Finally, there are many requests for small-area estimates for health and other social data, including educational participation, crime, health behavior, and disease spread (e.g., flu, heart disease, ADHD, cancer).
Business analysts argue that data made available close to certain events (or the period of measurement) are valuable. In response to Hurricane Katrina and Hurricane Sandy, users requested timely information about economic activity impact and reconstruction costs. Additionally, timely data on small areas regarding occupational and business activity during the recent recession might have been useful in targeting stimulus funds.
Using Big Data, the Census Bureau might be able to release preliminary data estimates much closer to the time of an event. These estimates would need revision after being base-lined with designed sample survey data. Some data estimates that might be released this way include potential preliminary estimates for county business patterns, minority or women-owned small business start-ups, housing sales and foreclosures, and transportation data such as mass transit and bicycle traffic that would reflect seasonal changes in transportation patterns.
Other estimates might include personal discretionary income, consumer confidence, Internet usage, health insurance participation and use, disability, child care, educational issues, and college or professional school enrollment.
Arguably of utmost importance is the protection of privacy and confidentially. There is growing public concern over privacy issues in the online data space. Recently, there has been increasing attention in the press to publish concerns about Big Data intrusions on privacy. This concern has grown to be so important that, in February 2012, the White House developed a framework for data privacy titled Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting innovation in the Global Digital Economy (PDF download). The basic principles of this framework include the following:
(a) Individual Control(b) Transparency(c) Respect for Context(d) Security(e) Access and Accuracy(f) Focused Collection(g) Accountability