GDELT is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world - API available
Mo Data stashed this in Data Sources
The Global Database of Events, Language, and Tone (GDELT) is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world over the last two centuries down to the city level globally, to make all of this data freely available for open research, and to provide daily updates to create the first "realtime social sciences earth observatory." Nearly a quarter-billion georeferenced events capture global behavior in more than 300 categories covering 1979 to present with daily updates.
GDELT is designed to help support new theories and descriptive understandings of the behaviors and driving forces of global-scale social systems from the micro-level of the individual through the macro-level of the entire planet by offering realtime synthesis of global societal-scale behavior into a rich quantitative database allowing realtime monitoring and analytical exploration of those trends.
GDELT's goal is to help uncover previously-obscured spatial, temporal, and perceptual evolutionary trends through new forms of analysis of the vast textual repositories that capture global societal activity, from news and social media archives to knowledge repositories.
"Reduced" 1979-2012 Dataset (Events)
This version of the event data contains only a subset of the data fields for each record and uses the "one-a-day" country-level filtering commonly used in previous country-level event datasets that did not have the city-level geographic resolution of GDELT. This version of the data will most closely match what users with previous event analysis experience are used to working with in terms of aggregation level, but collapses the database on BASEATTRS+SOURCE+TARGET+EVENTCODE. This means that this version of the data often collapses multiple separate riots on the same day in different cities into a single record, making it much more difficult to trace city-level spatial patterns in unrest.
The Reduced dataset comes with a set of Python scripts that makes working with the data easier. Most of the existing tutorials have been built using this dataset and this version is the easiest when getting started with the data. However, it is lacking many of the extended fields that can be highly useful in filtering the data, such as the geographic resolution field and does not break out the Actor codes, requiring keyword-based searches to look for specific actor types.
If you are looking for a quick way to start experimenting with GDELT and get your feet wet, this is a good place to start, especially since it lets you explore patterns over nearly a quarter-century from 1979 to 2012. However, if you are planning to work extensively with GDELT for publication or to use the daily event stream for watchboarding, forecasting, or other work, you may want to just go ahead and start with Daily Updates files so that you have access to all of the extended event record fields and can develop your tools to work with this version of the data, instead of working with the Reduced version and then having to adjust all of your tools for the different format of the Historical and Daily Updates files.
Note that this version of the data differs substantially from the Histroical and Daily Updates files, and their codebook is not applicable here.
- See more at: http://gdelt.utdallas.edu/data.html#sthash.keEsbkPg.dpuf
Stashed in: Big Data