American Curse Words, Mapped
Geege Schuman stashed this in Maps
The data is a billion tweets. I wonder if people behave differently on Facebook and in real life?
Jack Grieve, a professor in Forensic Linguistics at Aston University in England, has been tweeting out maps of the U.S. with geotagged data from Twitter that show where in the country we are using which swearwords.
Almost a billion tweets, from October of 2013 to November of 2014, were collected by Diansheng Guo at University of South Carolina, totaling nearly 9 billion words. Here’s how Grieve explained what happened once the data was collected:
For any word (e.g. fuck) we measure its relative frequency in each county by diving the total number of occurrences of that word in that county by the total number of words in that county.
We take that raw map and smooth it using a hot spot analysis (a Getis-Ord Gi local spatial autocorrelation analysis).
We map the Getis-Ord z-scores to identify clusters. Specifically, a high z-score means that that county is in the midst of a region where that word is relatively common, a negative z-score means that that county is in the midst of counties where that word is less common.
This is LITERALLY a map of who gives a fuck:
Orange regions are more likely to give a fuck on Twitter.
Blue regions are more likely to give zero fucks on Twitter.
I posted this to Reddit here:
Net-net: People not on the coasts are more likely to give zero fucks.