Mobility data, even aggregated statistics, can usually not be shared without privacy concerns. Within this publication, my co-authors Saskia Nuñez von Voigt, Helena Mihaljević, and Florian Tschorsch and I aim to provide a report that compiles typical analyses of urban human mobility and provides privacy guarantees so that it can be shared freely.
Some main issues and findings:
The issue of user-level privacy: with mobility data, a user often contributes more than a single trip. Thus, it is much more complicated to achieve user-level differential privacy. Broadly speaking, the more a user can potentially contribute to a dataset, the more noise needs to be added to maintain their privacy. More noise means less utility.
We tackle this issue by bounding user contribution and find for our evaluated datasets that for many analyses there is a “sweet spot” where utility is the highest balancing information gain (more trips) and noise induction. Basically, this can be understood as down-sampling ‘power users’.
For fine-granular analyses, like origin-destination matrices, we find that it is difficult to achieve high utility with an acceptable privacy budget ε, even for larger datasets. Thus, the following considerations need to be made to obtain a usable mobility report: (1) Before one optimizes error values, the question should be raised about how well a given data set is suited for user-level differential privacy. For certain analyses, datasets might just be too small for high utility and high privacy at once. (2) One should be mindful of the desired analyses. To save privacy budget, the selection of analyses should be limited to the ones needed. (3) Privacy budget should not be split equally between all analyses.
We implemented the report as a Python package.
If you try it out, we are happy about any feedback!