ANONYMIZATION AND ANALYSIS OF GEO-REFERENCED DEMOGRAPHIC AND SOCIO-ECONOMIC DATA: POPULATION, HOUSEHOLD, AND COMPANY ATTRIBUTES IN THE SYNPOP DATASET

As public institutions embrace data-driven governance, data anonymization has become vital to safeguard the privacy of invidivuals. This thesis explores several anonymization methods such as k-anonymity and target record swapping, and evaluates its effectiveness on a real-world dataset from SBB.

Gabriele Loiacono Ruta, 2025

Art der Arbeit Bachelor Thesis
Auftraggebende Schweizerische Bundesbahnen (SBB)
Betreuende Dozierende Templ, Matthias
Views: 10
As of 2025, the Swiss Federal Railways (SBB) use SynPop, a partly synthesized dataset combining demographic, socio-economic, and spatial data from sources like STATPOP and STATENT, for its transport modeling. Developed with ARE, SynPop supports mobility simulations and strategic planning, including the 2050 National Transport Perspectives. In the past, previous efforts to apply k-Anonymity yielded mixed results. While access to these datasets currently requires a form of NDA, the long-term goal is to enable unrestricted public access.
The procedure of the thesis is pretty straightforward. First, like a normal data science pipeline, the original datasets are analysed from a k-Anonymity perspective and discussed, afterwards the anonymization process is being carried out using a framework that was specifically built and coded in R. The results, before and after anonymization, are then evaluated and discussed, leading to conclusions and recommendations from a data anonymization standpoint.
The thesis delivers a functional and adaptable anonymization framework, capable of handling diverse datasets through parameter adjustments. Beyond technical implementation, the in-depth analysis of the original datasets enabled the formulation of specific recommendations, particularly regarding the quality and structure of synthetic variables already present in the data. The study also identifies key limitations and outlines potential next steps for improving data privacy practices for the SynPop dataset. Notably, this thesis represents a first effort in addressing k-Anonymity and target record swapping within SBB’s data ecosystem, laying a foundation for future research and studies on the matter.
Studiengang: Business Administration International Management (Bachelor)
Keywords Data Science, k-Anonymity, Data Anonymization, R, RStudio, Data Analysis, Data Governance
Vertraulichkeit: öffentlich
Art der Arbeit
Bachelor Thesis
Auftraggebende
Schweizerische Bundesbahnen (SBB), Bern
Autorinnen und Autoren
Gabriele Loiacono Ruta
Betreuende Dozierende
Templ, Matthias
Publikationsjahr
2025
Sprache der Arbeit
Englisch
Vertraulichkeit
öffentlich
Studiengang
Business Administration International Management (Bachelor)
Standort Studiengang
Olten
Keywords
Data Science, k-Anonymity, Data Anonymization, R, RStudio, Data Analysis, Data Governance