Read an article this week from Technology Review on The Emerging Science of Computational Anthropology. It was about the use of raw social media feeds to study the patterns of human behavior and how they change over time. In this article, they had come up with some heuristics that could be used to identify when people are local to an area and when they are visiting or new to an area.
Also, this past week there was an article in the Economist about Mining for Tweets of Gold about the startup DataMinr that uses raw twitter feeds to supply information about what’s going on in the world today. Apparently DataMinr is used by quite a few financial firms, news outlets, and others and has a good reputation for discovering news items that have not been reported yet. DataMinr is just one of a number of commercial entities doing this sort of analysis on Twitter data.
A couple of weeks ago I wrote a blog post on Free Social and Mobile Data as a Public Good. In that post I indicated that social and mobile data should be published, periodically in an open format, so that any researcher could examine it around the world.
Anthropology is the comparative study of human culture and condition, both past and present. Their are many branches to the study of Anthropology including but not limited to physical/biological, social/cultural, archeology and linguistic anthropologies. Using social media/mobile data to understand human behavior, development and culture would fit into the social/cultural branch of anthropology.
I have also previously written about some recent Computational Anthropological research (although I didn’t call it that), please see my Cheap phones + big data = better world and Mobile phone metadata underpins a new science posts. The fact is that mobile phone metadata can be used to create a detailed and deep understanding of a societies mobility. A better understanding of human mobility in a region can be used to create more effective mass transit, more efficient road networks, transportation and reduce pollution/energy use, among other things.
Social media can be used in a similar manner but it’s more than just location information, and some of it is about how people describe events and how they interact through text and media technologies. One research paper discussed how tweets could be used to detect earthquakes in real time (see: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors).
Although the location information provided by mobile phone data is more important to governments and transportation officials, it appears as if social media data is more important to organizations seeking news, events, or sentiment trending analysis.
Sources of the data today
Recently, Twitter announced that it would make its data available to a handful of research organizations (see: Twitter releasing trove of user data …).
On the other hand Facebook and LinkedIn seems a bit more restrictive in allowing access to their data. They have a few data scientists on staff but if you want access to their data you have to apply for it and only a few are accepted.
Although Google, Twitter, Facebook, LinkedIn and Telecoms represent the lions share of social/mobile data out there today, there are plenty of others sources of information that could potentially be useful that come to mind. Notwithstanding the NSA, currently there is limited research accessibility to the actual texts of mobile phone texts/messaging, and god forbid, emails. Although privacy concerns are high, I believe ultimately this needs to change.
Imagine if some researchers had access to all the texts of a high school student body. Yes much of it would be worthless but some of it would tell a very revealing story about teenage relationships, interest and culture among other things. And having this sort of information over time could reveal the history of teenage cultural change. Much of this would have been previously available through magazines but today texts would represent a much more granular level of this information.
Archeology is just anthropology from a historical perspective, i.e, it is the study of the history of cultures, societies and life. Computational Archeology would apply to the history of the use of computers, social media, telecommunications, Internet/WWW, etc.
There are only few resources that are widely available for this data such as the Internet Archive. But much of the history of WWW, social media, telecom, etc. use is in current and defunct organizations that aside from Twitter, continue to be very stingy with their data.
Over time all such data will be lost or become inaccessible unless something is done to make it available to research organizations. I believe sooner or later humanity will wise up to the loss of this treasure trove of information and create some sort of historical archive for this data and require companies to supply this data over time.
Photo Credit(s): State of the Linked Open Data Cloud (LOD), September 2011 by Duncan Hull