The news papers are currently not only fueling the data mining trend, they are pushing it themselves. The New York Times is leading the way with the data viz lab, but also the Guardian with the data blog is very active. Now the Wall Street Journal is picking up on the topic and has just published a number of stories on Foursquare data analysis.
Just like the NYT or the Guardian, the people at the Wall Street Journal are not just writing about it they are collecting and processing the data themselves. It's almost like the return of the good old journalist days, where stories are entirely produced by news papers, just that it is digital, remote and massive this time.
Image taken from l2thinktank / The heat map showing Foursquare check-in activity in New York between 9am and 11am on a Monday.
The WSJ projet 'A Week on Foursquare' collected every single check in on Foursquare for an entire week, earlier this year. They recorded "Over 10.9 million check-ins — that's more than 1.5 million a day" which includes check-ins world wide. For New York alone there are 310,000+ check-ins and for San Francisco and the Bay Area 190,000+.
Comparing these numbers to the NCL Twitter data set that we collect for urban areas, this Foursquare data set seems to be quite good. With a low level access to Twitter we recorded about half the number of location based tweets that WSJ did on Foursquare. For the New York NCL map we collected 160'000+ location based tweets and 80'000+ for San Francisco NCL. However, there are a ot more tweets out there that potentially coud be draw in to a dataset. WSJ claims to have collected every single Foursquare check in, where with Twitter we are collecting about 1% of tweets. So there is some potential in terms of numbers. This is however not to say , that the quality of the information would be better. It might, but who knows.
Image taken from markwilliammann / A graph showing the popularity of the top 80 locations along percentage of check-ins. Blue and red show New York and San Francisco respectively. The list is clearly lead by the check-in at the office followed by home and coffee shop.
The people at WSJ have run all sorts of visualisation on the data, mainly focusing on the vis of distribution and rankings. With the check-in function Foursquare offers, the venues are a very obvious element to look at. They concluded "the distribution of venues world-wide showed that out of the 2,197,870 venues with a category assigned, and at least one check-in during the week, 44.5% had just one check-in. Just 2,500 venues had 100 or more check-ins. The most popular had more than 13,000 over the week, and the second most had almost 7,000. Venue popularity dropped off very quickly, but had an extremely long tail. The average number of check-ins per venues was 4, and the median was 2."
This one extremely popular venue in fact was not an actual place, but a fictional place in New York with the name Snowpocalypse 2011 and was related to the snowfall New York saw during the week the data was collected. Some 13'000 people checked in over the period of of two days. See the weather related details HERE.
On the man page you can replay the data on a map for the locations New York and San Francisco. The visualisation is based on heat maps overlaid on Google Maps. Interesting is the offering of three views, each showing the same location but with a one hour time shift. Using this its interesting to follow a build up of activity or at the end of the day the rather quick brake of in activity.
San Francisco New City Landscape
Image by urbanTick using the GMap Image Cutter / San Francisco New City Landscape - Use the Google Maps style zoom function in the top right corner to zoom into the map and explore it in detail. Explore areas you know close up and find new locations you have never heard of. Click HERE for a full screen view. The maps were created using our CASA Tweet-O-Meter, in association with DigitalUrban and coded by Steven Gray, this New City Landscape represents location based twitter activity.
Comparing the WSJ Foursquare heat maps to the NCL density maps that we produced using the Twitter data there are a lot of similarities. In fact mostly the maps overall pick up the same hot spots. Most notable is the difference in New York, where Brooklyn is almost not active on Foursquare, where on Twitter, Brooklyn is as active as Manhattan. Also the Financial District is on Foursquare is not quite as active and trails behind Soho. In San Francisco the overal pattern is again similar with the main activity area concentrated around the Market Street. However, interestingly there is quite some activities around the Golden Gate Park especially after work between 5pm and 9pm.
These data sets will increasingly play an important role in spatial analysis, simply because they seemingly deliver the facts to activities and places. However, looking at the stats of both Twitter and Foursquare it shows how little of the population is represented in these data sets. WSJ makes it quite clear in their blog post that "a survey last year showed that fewer than 5% of Americans had ever used Foursquare or its rivals, and only about 1% used such a location-based service on a daily basis." As such, beside the visualisation there is currently little that can be finally concluded on a general level. Nevertheless, there is a quite a lot o potential for projects looking at the sample as such, inventing a constrained context. There are enough details that can be investigated limited to the sample.
New York New City Landscape
Image by urbanTick using the GMap Image Cutter / New York New City Landscape -Use the Google Maps style zoom function in the top right corner to zoom into the map and explore it in detail. Explore areas you know close up and find new locations you have never heard of. Click HERE for a full screen view.