Sign up FAST! Login

List of Public Data Sources Fit for Machine Learning from BigML

List of Public Data Sources Fit for Machine Learning

Below is a wealth of links pointing out to free and open datasets that can be used to build predictive models. We hope that our readers will make the best use of these by gaining insights into the way The World and our governments work for the sake of the greater good. If you have an academic or research project, please keep in mind that BigML offers special discounts and free access for those.

Data JournalsData-artikelen | Sargasso Data journalism and data visualization from the Datablog | News | The Guardian

Data Marketplaces and Data HubsKnoema – Home Public Data Sets : Amazon Web Services Socrata Data Publica | Les données pour votre business Archive-It – Web Archiving Services for Libraries and Archives Freebase Google Public Data Explorer Welcome – the Data Hub Data Sets | AggData Find & Purchase Data Subscriptions | Windows Azure Marketplace Factual | Home


Data Search EnginesZanran Numerical Data Search Quandl – Intelligent Search for Numerical Data


International Bodies & AgenciesIMF Data and Statistics Data | The World Bank OECD.Stat UNdata Data and maps — European Environment Agency (EEA) Eurostat Home


Local GovernmentInicio Misiones Open Government Data Wien (OGD) Open data – City of Brussels Open Data – Brisbane City Council Open data – Salford City Council Sunderland City Council : Local Public Data Welcome to the London Datastore | London DataStore Leeds City Council – Open Data Home – DataGM – Data Greater Manchester Open Data | Derby City Council Council data – Brighton & Hove City Council Open Data – Birmingham City Council Aberdeen City Council Open Data Open Data – City of Waterloo Open Data catalogue | City of Vancouver Open Data Home – Open Data – Home | City of Toronto City of Prince George – Open Data Catalogue Open Data Ottawa | City of Ottawa Open Data Catalogue – City of Red Deer Open Data | City of Niagara Falls, Canada Open Data Catalogue | City of Nanaimo – Residents – Publications and Open Data Catalogue City of Medicine Hat Open Data Catalogue Kamloops open data Open Data Catalogue Kelowna City of Hamilton – Open Data City of Fredericton – Open Data Home City of Edmonton Open Data Catalogue City of Somerville, MA Data.Seattle.Gov | Seattle’s Data Site City of Scottsdale Welcome – Santa Cruz Open Data Data | San Francisco Open Raleigh – The Official City of Raleigh Portal Datasets | Portland OR OpenDataPhilly – Connecting People With Data NYC Open Data Greater New Orleans Community Data Center City of Madison | Open Data City and County of Honolulu US/Data Catalog District of Columbia Denver Open Data Catalog | The Cook County Government Open Data Website City of Chicago | Data Portal Open Government | City of Boston OpenBaltimore / City of Baltimore’s Open Data Catalog | Open Austin OpenDataAsheville – Connecting People With Data US/Arvada GovHK: About Data.One Singapore


Machine Learning ChallengesACM KDD CUP Competitions – Kaggle Data – Repository – Causality Workbench TunedIT – Data mining & machine learning data sets, algorithms, challenges


Machine Learning DatasetsTunedIT – Data mining & machine learning data sets, algorithms, challenges mldata :: Welcome UCI Machine Learning Repository: Data Sets


Miscellaneous Data SourcesIHME | Institute for Health Metrics and Evaluation Gapminder: Unveiling the beauty of statistics for a fact based world view. Doing Research in New York City Public Schools and Requesting Data – NYC Data – New York City Department of Education RITA | BTS | Title from h2 Oregon Climate Data Quantnet :: Start Data Tools – Locators My Data | Measured Me Webscope from Yahoo! Labs Research Data Online Data – Robert Shiller Obtaining Data From the NSSDC Cancer Program Data Sets Million Song Dataset | scaling MIR research Google Ngram Viewer Data | GeoDa Center Home – GEO DataSets – NCBI The Financial Data Finder A – G Frequent Itemset Mining Dataset Repository Europeana Professional – Linked Open Data Inforum – EconData Summary of Data Sets by Application Area Data Sets | Pew Research Center’s Internet & American Life Project Cosm – Explore Advanced NFL Stats: Play-by-Play Data


National Governments and StatesPortal de Obligaciones de Transparencia Junta de Andalucía – Datos abiertos Reutilización de la Información del Sector Público | Reutilización de la Información de los Servicios Públicos Portal de Datos Abiertos de JCCM Ayuntamiento de Zaragoza. Datos de Zaragoza Reutilización Dades obertes Lleida – Ajuntament de Lleida ISTAC | El ISTAC Dades Obertes. Generalitat de Catalunya Dades Obertes CAIB Reutilización de la Información del Sector Público en Gijón Open Data Euskadi ataria, Eusko Jaurlaritzaren datu publikoen irekitzea Data for Hawaii | Florida Has A Right To Know Commonwealth Data Point Open Data | Connecticut Transparency Website Open Data NYS Data Center DataShare State of Alabama – Open Government for the State of Tennessee | Government | State Facts and History OpenDoor – Kentucky | Open Illinois SOM – Michigan Data Store Louisiana Transparency and Accountability Portal | State of Missouri Data Portal DATAshare | Minnesota open data // your portal for Minnesota data transparency Open Data Texas Welcome to Oklahoma’s Official Web Site KanView: Kansas Transparency Taxpayer Act – Kansas Revenues and Expenditures Search OPEN SD :: South Dakota Government Information North Dakota GIS (Geographic Information Systems) State Government Data New Mexico The Official State Web Portal Arizona OpenBooks | – Arizona Transparency Finances in Detail Utah Data – | Data Transparency for the State of California Oregon Data | Opening Oregon’s Data Data.Washington | Washington State’s Data Site Home | Portal de Datos Públicos – Inicio | Portal del Estado Uruguayo Bem vindo – Portal Brasileiro de Dados Abertos Directorio de Empresas, Marcas registradas, Normas legales y Teléfonos en Perú – The Portal to Ireland’s Official Statistics | The Belgian open data initiative het open dataportaal van de Nederlandse overheid PortalU – German Environmental Information Portal Statistical database | Portalul datelor guvernamentale deschise al Republicii Moldova Offene Daten Österreich | Vitajte – | I dati aperti della PA Δημοσια, Ανοικτά Δεδομένα Open Kenya | Transparent Africa SAUDI | National e-Government Portal – Home – New Zealand government data online » 국가공유자원포털 中国政府公开信息整合服务平台 Open Data Canada OpenAid – Start | Åpne offentlige data i Norge – Difi Portada | Open Data Colombia home |


Open Companies Data SourcesYelp’s Academic Dataset | Yelp Data Export – Prosper Lending Club Statistics – Lending Club


U.S. Agencies Data SourcesFederal Agency Participation | FRB: Data Download Program (DDP)


Various Lists of Data SourcesProgramming Challenges: What are some good “toy problems” in data science? – Quora Data: Where can I find large datasets open to the public? – Quora Data Analysis: What’s your favorite free data source? – Quora What are some publicly available market data feeds? – Quora Is there a reliable free source for per country LinkedIn statistics? – Quora @pskomoroch #dataset – Delicious Free, Public Data Sets | Hacker News Datasets « Kevin Chai’s Homepage List of European Open Data Catalogues at Open Data Datasets Archive Some Datasets Available on the Web » Data Wrangling Blog


Research Quality Datasets by Hilary MasonLending Club Loan Data SMS Spam Collection Flickr personal taxonomies Yahoo Data for Researchers ICWSM Spinnr Challenge 2011 dataset Quantum Chaotic Thoughts: Facebook100 Data Set Public Data Sets on Amazon Web Services (AWS) The ClueWeb09 Dataset Census Bureau Home Page Data | The World Bank ImageNet What is Twitter, a Social Network or a News Media? – WWW’10 dotbot | help – arXiv Bulk Data Access – Amazon S3 YouTube Dataset Face Recognition Homepage – Databases Pajek datasets UCI Network Data Repository Datasets for “The Elements of Statistical Learning” Enron Email Dataset MovieLens Data Sets | GroupLens Research Translation Task – EMNLP 2011 Sixth Workshop on Statistical Machine Translation Project Gutenberg About WordNet – WordNet – About WordNet Aligned Hansards of the 36th Parliament of Canada CRCNS – Collaborative Research in Computational Neuroscience – Data sharing USENET corpus UniGene ChEMBLdb UCI Machine Learning Repository Gene Expression Omnibus (GEO) Main page Social Science Data IMDB dataset Stanford Large Network Dataset Collection Google Books n-gram dataset Million Song Dataset | scaling MIR research Belly Button Biodiversity 2.0 Sharing PyPi/Maven dependency data « RTFB Click Dataset | Center for Complex Networks and Systems Research The Electric Rice Cooker — One year of deleted weibos archive Registered meteorites that has impacted on Earth visualized – AnalyticBridge GeoJSON files for real-time Virginia transportation data. NYPD Crash Data Band-Aid 11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts | Research Blog Big data set – 3.5 billion web pages – made available for all of us – Big Data News Data.Seattle.Gov | Seattle’s Data Site New Crawl Data Available! | CommonCrawl Detailed data on pass rates, race, and gender for 2013 Data Download


Stashed in:

To save this post, select a stash from drop-down menu or type in a new one:

Whew that's a lot of sources!

You May Also Like: