There is a common misconception by data scientists who are not familiar with privacy measures that anonymization of data happens prior to and is independent of further analyses. The anonymized data can then supposedly be used for any desired use case. But the opposite is true: the better you know your use cases the more precise you can identify and apply suitable privacy-preserving methods to your data (Note: Entirely anonymous data is hard (or even impossible) to achieve as demonstrated repeatedly when personal information is extracted from supposedly anonymous data. Privacy-enhancing or -preserving are “softer” terms used in this context that do not claim to have reached full anonymity).
To give a simplified example: if you want to know where most shared bicycles are rented, your privacy method should preserve the spatial distribution of renting locations but you can remove information on exact trajectories as a part of your privacy enhancement. On the other hand, if you are interested in knowing how fast bikes ride on different streets you should maintain information on the exact route and speed but you can cut off start and end locations to enhance users’ privacy. If you want to keep information about all attributes, either your level of privacy suffers or the utility does.
Thus, to find suitable privacy-enhancing methods, it is vital to specify the use cases for which the data is needed for as a first step. This is also the approach we use to research privacy-preserving methods in our FreeMove project and which I will use to structure my dissertation.
So, what is mobility data acutally needed for?
It is obvious that mobility data can have positive benefits, e.g., for urban planning. It is less obvious which exact purposes are pursued with which kind of data and what level of granularity is needed. This is why I want to start a collection of urban mobility use cases and examples to obtain a better overview of needed data formats and granularity as a base to define suitable privacy methods. To be specific, I am thereby interested in use cases that serve the interests of citizens, omitting use cases that only serve marketing purposes or company interests. Additionally, the listed use cases are all based on the collection of larger amounts of data for further analyses or modelling, unlike applications that need precise, real-time location information of a single user for their operation, e.g., a navigation app.
As there is a variety of use cases, I will split them into three blog posts, grouped into (preliminary) categories Urban and Traffic Planning, Traffic Management and Routing, and Other Use Cases, starting below with the first.
I do not claim to have created an exhaustive collection and would be pleased to receive hints on further use cases.
You can find a tabular overview of all use cases in this GitHub Repository.
Urban and Traffic Planning
One of the most stated purposes for the need for fine granular human mobility data in urban and traffic planning. So let’s have a deeper look at what that actually means.
Traditionally, city and traffic planners use simulation software to test different scenarios, e.g., the impact of a construction site, the opening of a new shopping mall (which attracts many people), or a new bus line on the traffic? Software like PTV Visum implements the traditional four-step travel demand model that uses information on origin-destination connections between traffic cells of a regular workday for simulations. Such origin-destination information is traditionally gathered through surveys of representative samples like “Mobilität in Deutschland”, which is conducted every four years.
Newer approaches include agent-based simulations like MatSim. They need day plans as input data for their agents. Day plans include detailed information of a list of activities with temporal information and traffic mode. The plans describe the intentions of agents which might not be realized if they are too optimistic, e.g., due to traffic jams. Such day plans are usually also based on survey data (see, e.g., Kickhöfer et al., 2016) or data on commuting statistics (see, e.g., Ziemke et al., 2019).
Another example for a new simulation approach is the project bikeSim which uses real trajectories of cyclists as input data to create an affordable and easy-to-use planning tool for bike infrastructure.
With the stronger focus on climate protection and a shift from private cars to more climate-friendly mobility alternatives (“Verkehrswende”), the promotion of cycling is becoming an increasingly important (political) topic. This especially includes the expansion of bike infrastructure. The analysis of the status quo is stated to be an important step to detect current shortcomings in the bike infrastructure and help city planners prioritize fields of action.
SimRa is a project that crowd-sources data on almost accidents. The OpenBikeSensor is a citizen science project, that crowd-sources measurements on passing distances of cars. Movebis uses GPS tracks from a bike app to provide cities with bike traffic volumes and average speeds that are aggregated on street levels. Lu et al. have compared actual bike trajectories with shortest paths to identify why cyclists avoid certain routes. Brauer et al. investigated bike traffic fluency to provide insight into the quality of city cycling. Various research and application projects use sensors to identify bumpy roads (e.g., see this survey, this “Sensorbike” project, this proof-of-concept, or this hackathon).
All these projects use GPS trajectories and sensor data to map shortcomings in the bike infrastructure onto the streets, e.g., low speed, dangerous situations, low fluency, avoided routes, or bumpiness. A different approach looks at high-demand origin-destination connections to optimize the bike infrastructure for those routes. For example, the Propensity to Cycle Tool is a web interface that uses origin-destination commuting data to visualize highly frequented routes and the share of cyclists and routes where cycling has the greatest potential to grow.
Even though there are many citizen science and research projects that deal with the question of identifying shortcomings in bicycle infrastructure, little is known about how such information is actually used by city planners in public administrations. It would be interesting to obtain more insights into actual planning processes and whether such data sources are considered or needed. Of course, next to planning practices, quantifying shortcomings in the bicycle infrastructure is a valuable source to support political arguments.
Determining accurate counts of bicycle volumes is necessary for city planners, e.g., to receive fundings based on reports of bicycle use and safety improvements (Nordback et al., 2013). In Germany, such counts can be used for a rededication of a street to a bicycle street. Cities usually use stationary counters, though they are expensive. The Open Traffic Count is a project that implements a video-based approach for a cheaper and more privacy-sensitive alternative, using the OpenDataCam.
Also, for a reliable and representative indication of traffic volumes counters need to be placed mindfully. This project uses GPS trajectory data from Strava to identify good locations for such counters.
Public transport offer
As there are not unlimited resources for the operation of public transport, they need to be used smartly. Public transport offers, such as routes of lines, frequencies, and service times, can be optimized by considering demand data.
Public transport companies, therefore, make use of traffic models (see above) as well as analyses of mobile phone data, or automatic passenger counting (e.g., with sensors by providers like Dilax or Iris). Routing queries to public transport applications also give insights into demands and can be used to evaluate current public transport offers (e.g., providers like Hacon offer such services). Researchers also investigate how machine learning models can be used to predict public transport demand for special events and evaluate system performance (Santanam et al., 2021). Others research demand-based real-time adjustments for autonomous vehicles (Cao et al., 2019).
Public transport and private vehicles are increasingly complemented by shared mobility offers (bike-, e-scooter, moped, or car-sharing). Mobility data can help with the demand-based positioning of docking stations and positioning of parking spots, or the (re-)distribution of vehicles based on demands.
Even though a demand-based approach can help to grow these services, a city should be careful to not only base its strategy on current demand but, e.g., also complement the public transport offer in poorly connected regions. Also, demand data only reflects the status quo and might change if other factors, e.g. better infrastructure, change. Therefore, fine-granular demand data might be important for profit-oriented mobility service providers but should be used carefully for strategic purposes by city planners.
City administrations gather information about the use of shared micro-mobility usage behavior and monitor its development over time. The mobility data specification (MDS) has been established as a standard for micro-mobility that more and more mobility service providers use to share data. The specification includes start and end locations of rentals but also goes beyond human mobility data (e.g., charging status of vehicles or definition of prohibited parking zones. For a use case overview see this article and this MDS use case gallery). Service providers such as Vianova, Populus, or Remix collect this data from the providers and offer tools to cities that can access the combined data of multiple providers for monitoring, analytical, and managing purposes.