Big Data has had a fast career as new topic (and occasionally buzzword) in the social sciences during the last years. It describes new opportunities for research emerging from the unprecedented and largely untapped quantity, immediacy, and variety of data produced by modern information technology. A lot of this data is spatially referenced, and might thus also inform planning and urban policy, despite potential methodological questions and ethical risks.
Batty (2013) notes humorously that data starts to be “big” if it “cannot fit into an Excel spreadsheet” (for those of you who wonder: it’s 1,048,576 * 16,384 cells). But Big Data isn’t just new due to sheer quantity. As summarised by Kitchin (2014), it is usually also:
- high in velocity, being created in or near real-time;
- diverse in variety, being structured and unstructured in nature;
- exhaustive in scope, striving to capture entire populations or systems (n = all);
- fine-grained in resolution and uniquely indexical in identification;
- relational in nature, containing common fields that enable the conjoining of different data sets;
- flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly).
I would also add that most of it covers longer timespans and is geo-referenced (which makes it particularly valuable to planners). Certainly one of the most striking examples is geotagged user data generated by smartphone apps, where the change in scale from “small data” is very apparent.
Which advantages and disadvantages are associated with Big Data for researchers in planning?
An interesting example are the log files of Search Engines: They can be a tool to discover the rise and fall of theoretical concepts (e.g. Google Trends or NgramViewer, which charts the occurrence of keywords in all books digitalised by Google).
Many other useful sources of Big Data are of social nature as well. But Big Data research isn’t necessarily all about the dataset itself: it can also supplement classical “small data” studies. For instance, in a research project to analyse migration patterns in the Munich region in Germany, we integrate user-generated data on house renting and selling prices from one of the biggest German online real estate marketplaces with millions of datasets on a fine-grained temporal and geographical basis to connect residential choices to prices.
These examples are based on data that was made public on a voluntary basis by corporations. However, many of the impressive visualisations and applications of Big Data we have seen throughout the last years are based on publicly available data from Twitter messages to government open data.
The swarm intelligence of internet users can also be actively tapped for urban research, as in the MIT project PlacePulse, which presents visitors of the website two random Google StreetView-Images to select which one looks safer. The project seeks to map urban perceptions in a “gameified” way to incentivise data collection and generated more than a million records.
I adopted the idea on a small scale and programmed a similar version for comparisons within London (without the design bit…) with interesting results: Based on a much smaller dataset, safety perception seems to be more fine-grained and detached from the common ‘mind-map’ of safe and less safe areas. Besides expectable results around wealthy areas like Pimlico and Bayswater, in fact, some of the most ambient areas like Hyde Park were rated average, while parts of Brixton show a remarkably high share of ratings as “safe”.
City administrations increasingly try to tap the collective knowledge of their citizens as well: Apps like Commonplace allow users to report their ideas on local issues from planning proposals to faulty street lamps.
Nevertheless, the most obvious use of this user-generated open data in planning so far are probably maps (like OpenStreetMap) and 3D city models (e.g. again Google SketchUp and Earth). Following the Wikipedia method, a large number of users simultaneously creates datasets that are often cheaper, more up to date and more detailed than official government data, though not always spatially comprehensive and reliable. For design studios in planning practice as well as localised research however they can improve the knowledge base.
Many discussions of Big Data in relation to Cities feature the “Smart City” as new leitmotif. In essence, the Smart City can be described as engineered and quantified version of the Sustainable City, with sub-components from Smart People, Mobility or Living to Smart Environment, Economy and e-Government. From a planning angle, the discussion usually revolves around the equipment of cities with more sensors and their linkage (“internet of things”) to better understand and control the cities’ utilities and functioning. The most-cited example is certainly traffic optimisation, and the enthusiasm it is discussed sometimes resembles the optimism of 1950s modernist and functionalist planning.
This points at the drawbacks of Big Data: While some scientists provocatively assert “the end of theory” where hypotheses are rendered useless by testing the data that immediately revealed all important patterns and correlations, others stress that to make sense of the data, research questions and the awareness of potential biases through theoretical underpinnings are still needed. Not every correlation is causation. Especially related to cities, the first view results in an analysis that is too functionalist and “ignores the effects of culture, politics, policy, governance and capital“ (Kitchin 2014).
Big Data can render studies possible that were too expensive or impractical before, but ethical and privacy limitations come to the fore: While government and corporate open data initiatives increase transparency, users must retain the right to opt out of data collection, especially for the use of basic public services.
- Batty, Michael (2013): Big data, smart cities and city planning. In: Dialogues in Human Geography 3(3), 274-279.
- Kitchin, Rob (2014): Big Data, new epistemologies and paradigm shifts. Big Data & Society April/June 2014, 1-12.