MaskMyPy: python tools for performing and analyzing geographic masks

Swanlund, David; Schuurman, Nadine

doi:10.1186/s12942-025-00399-6

Methodology
Open access
Published: 09 May 2025

MaskMyPy: python tools for performing and analyzing geographic masks

David Swanlund¹ &
Nadine Schuurman¹

International Journal of Health Geographics volume 24, Article number: 12 (2025) Cite this article

156 Accesses
Metrics details

Abstract

Background

Geographic masking is an important but under-utilized technique for protecting and disseminating sensitive geospatial health data. Geographic masks work by displacing static point locations such that the people those locations describe cannot be identified, while at the same time preserving important spatial patterns for analysis. Unfortunately, there is a lack of available tooling surrounding geographic masks which we believe creates an unnecessary barrier towards the adoption of these techniques. As such, this article presents a set of tools for performing, evaluating, and developing geographic masks, called MaskMyPy.

Results

MaskMyPy is an open-source Python package that includes functions for performing geographic masks, including donut, street, location swapping, and Voronoi masks. It also includes a range of tools for evaluating the results of these masks, both with regard to privacy and information loss. Finally, it includes a special class called the ‘Atlas’ that aims to dramatically streamline mask execution and evaluation. We conducted a short case study to illustrate the power of MaskMyPy in geographic masking research, and in doing so showed that mask performance can range widely due solely to randomization. As such, we recommend that masking researchers test their masks repeatedly across a variety of test datasets.

Conclusion

MaskMyPy makes it easy to apply a variety of geographic masks to a set of sensitive points and then measure which mask provided the most privacy while suffering the least information loss. We believe this style of tooling is important to not only make geographic masks accessible to non-experts, but to enable expert users to better interrogate the masks they develop, and in doing so drive the geographic masking discipline forward.

Background

Whether through the use of spatial statistics or cartographic visualization, geospatial data can be a powerful resource for researchers looking to link health outcomes to environmental risk factors. As important as this type of research is, however, it cannot come at the expense of patient privacy; as the saying goes, ‘first, do no harm’. Geographic masks are techniques that offer an avenue for exploring and distributing useful spatial health data without violating the privacy of the people that data describes. Sadly, geographic masks are not readily available to the users that would benefit most from them. Major GIS applications and libraries alike lack geographic masking functions, requiring would-be users to dive into niches of academic literature to identify, understand, implement, and evaluate the masks that would benefit them. For many, this friction simply puts geographic masks out of reach. This can lead to one of two problems: either sensitive data gets published without being properly anonymized, violating privacy, or it stays locked away, preventing other researchers from being able to explore it whatsoever [1, 2]. The goal of this article is to showcase geographic mask tooling that makes both of these outcomes more easily avoidable.

Fundamentally, geographic masks transform spatial point data with the intent to protect geoprivacy while simultaneously preserving the utility of that data for analysis [3]. Unfortunately, these two objectives are fundamentally at odds: the more we do to protect privacy in a spatial dataset, the less useful the data becomes. The goal of any geographic mask is to optimize this trade-off, such that privacy is sufficiently protected while incurring the least amount of information loss. Many geographic masks have been developed over the years, each attempting in their own way to best satisfy this fundamental goal [4,5,6]. As a result, there are now a variety of masks to choose from. Actually making this selection, however, can be quite difficult.

This difficulty stems from the fact that every mask performs differently depending on the underlying nature of the data, and measuring these differences can be the basis of an entire research publication by itself. Many geographic masking studies use a wide variety of methods for assessing privacy protection and information loss, often with interpretations that are at odds with each other [7]. Moreover, even if a would-be masker found three studies that each used identical methods to assess a handful of different masks, their results would be difficult to utilise as they are specific to the dataset being masked. For instance, variations in population density and urban form can greatly affect the outcome of a masking procedure [8]. A mask applied to data in a sprawling city like Orlando would likely perform quite differently than in a highly dense city like New York.

The underlying structure of the data itself also affects geographic masking: how many points are there, are they dense or sparse, who or what are they describing, are they clustered, do they span extremely mixed population densities, and do they represent fixed locations or do they move over time? Further complicating things is the fact that most masks have parameters, such as maximum displacement distance, that must be taken into account. It is impossible to know what the optimal parameters are without first testing a variety of them. Finally, mask performance can vary solely due to the randomization element that many masks rely on, meaning that studies that evaluate masks should ideally do so over many iterations of the mask. Ultimately, each of these factors add up such that just because one study showed a given masking technique to be the best for their data and context, does not mean it will be the best for your data and context.

This means that it is important to perform some level of testing and validation on a masked dataset before publishing it. Such testing may not need to be as comprehensive as a full research study, but there should be some level of validation nonetheless. Unfortunately, this is both difficult and time consuming. Geographic masking can already be a burdensome process even without performing this analysis, and studies have shown that many researchers forgo privacy protection entirely and simply publish sensitive location data in maps [1, 2].

Our previous research has attempted to reduce this friction by developing both masks and tools that are more simple to use [9, 10]. This article is an extension of this research, and describes an open-source Python package for geographically masking point data, called MaskMyPy. MaskMyPy was first introduced within a 2020 article proposing a new method for geographic masking, called Street Masking [10]. At the time, the software largely focused on providing a small number of easy-to-use geographic masking functions for anonymizing GeoDataFrames, with the Street mask being its primary feature. This article presents a new, more powerful iteration of this tool with a much larger focus on the analysis aspect of geographic masking. It allows users to quickly execute a number of geographic masks along with any combination of parameters on a given dataset, all while automatically calculating privacy and information loss metrics. This allows for rapid comparison and evaluation of mask performance, making it far less burdensome to robustly protect geospatial data. Moreover, its features may help aid other researchers in developing new geographic masks, providing a framework for this research community to build upon.

This article begins with a brief overview of geographic masking and existing tools. Next, it highlights the core features of MaskMyPy, including its inbuilt masks, analysis tools, and management features. These features are then used to analyze the results of hundreds of mask iterations, with the goal of highlighting a range of often overlooked factors that impact mask performance. Finally, the article concludes by discussing the importance of, and need for, comprehensive privacy tooling in academic research.

Geographic masks

Geographic masks have steadily evolved over the last two and a half decades. First proposed by Armstrong et al. [3], early masks are best exemplified by affine transformations and random perturbation. Affine transformations include techniques that translate, rotate, or scale point patterns globally by a predefined value in order to protect privacy. A concern with these techniques, however, is that re-identifying a small subset of the data can lead to the entire dataset being re-identified as well, as all points are transformed using the same values. A partial solution to this is to split the dataset into a grid and perform different affine transformations locally to each cell [11]. Random perturbation, on the other hand, provides a much stronger solution to this problem, as every point is displaced randomly within a given maximum distance [3]. Because each point is treated independently, re-identifying one point (or even a small subset of points) in a given dataset cannot be used to then re-identify other points in the same dataset.

Subsequent masks often tweak this basic formula in order to make up for random perturbation’s weaknesses. For instance, a weakness of random perturbation is that points may only be displaced small distances (e.g. 2 m), providing almost no privacy protection. Donut masking adds a minimum displacement distance to solve this issue [4]. Another weakness is that points can be displaced to impossible locations, such as the ocean. Location swapping and the verified neighbor mask both solve this issue by leveraging contextual address data; instead of displacing points entirely at random, they will relocate given a point to a randomly selected address nearby, helping to ensure that the masked data remains more realistic [6, 12].

However, other masks have been developed that take entirely different approaches. For instance, Adaptive Areal Elimination (AAE) seeks to provide a guaranteed minimum level of privacy [5]. It does this by iteratively aggregating census polygons until their combined population reaches a minimum threshold. Then, points are displaced within each aggregated polygon. This approach ensures that a minimum level of privacy is achieved even when the population is heterogeneously distributed.

Of course, researchers have developed more masking techniques than can be described here. Briefly, these include Street masking (which relocates points along the OpenStreetMap road network) [10], Voronoi masking (which relocates points by creating voronoi polygons and snapping them to the nearest edge) [13], multi-scale masking (which relocates points by strategically switching digits in their coordinates on the Military Grid Reference System) [14], Triangular Displacement (which relocates points based on multiple risk factors with a focus on computational efficiency) [15], NRand-K (which relocates each point based on the density of other points nearby) [16], and Adaptive Voronoi masking (which combines elements of Voronoi masking and Adaptive Areal Elimination) [17]. This is not an exhaustive list, and there are further variations of masks one may consider, such as whether to use a uniform or gaussian distribution when performing random perturbation, or whether to select the displacement distance entirely at random or by weighing it using population data. Indeed, the wide variety of ways to go about geographic masking underscores the importance of evaluation and tooling.

Mask evaluation

Geographic masks are evaluated based on two primary concerns: how much they protect privacy, and how much information loss they incur. Beginning with privacy protection, researchers have largely settled on one primary evaluation metric: spatial k-anonymity. Spatial k-anonymity is an adaptation of the popular k-anonymity metric [18,19,20], which measures the uniqueness of records in a given dataset. For example, a record is 10-anonymous if it is indistinguishable from 9 other records in the same dataset. Spatial k-anonymity on the other hand may consider a masked address as 10-anonymous if there are 9 other addresses closer to it than to its original, unmasked location [21]. However, as Seidl et al. [7] note, there is a degree of disagreement in the literature regarding this definition. More specifically, should k-anonymity be measured relative to the original sensitive location, or relative to the masked location? In our opinion, and given that the goal of geographic masking is to prevent reverse geocoding of the masked data, it seems fitting to measure it based on the masked location, as this is what an adversary seeking to re-identify the address would be dealing with. Nevertheless, this does represent an inconsistency in the literature that easily goes unnoticed.

Moreover, it must be noted that spatial k-anonymity estimates are somewhat imprecise. When census data is used in the calculation, there is an inherent assumption that the population within the census area is uniformly distributed, which is rarely, if ever, the case. The use of address data ameliorates this problem, but tends to use addresses as a proxy for individuals, which introduces its own issues. Finally, spatial k-anonymity has also been measured without considering a background population, and instead considers a masked address 10-anonymous if 9 other masked addresses within the dataset are closer to the unmasked address than the masked address [22]. This coincides more closely with the original formulation of k-anonymity, but is extremely difficult to satisfy spatially without incurring extreme information loss.

While spatial k-anonymity poses a dominant yet somewhat ambiguous measure of privacy protection, measures of information loss are both more varied and yet more clear cut. Fundamentally, measuring information loss means measuring changes to the point pattern introduced by the masking process. This is commonly achieved by looking at how masking changes the number, location, and/or size of clusters that can be identified in the data [4, 6, 12, 19, 23]. For instance, one can use Ripley’s K function to measure clustering at multiple spatial scales on both the unmasked and masked data, and then plot the difference between the two [6, 12]. Any difference, whether it be towards greater clustering or dispersion, is an indication of information loss. Other clustering metrics that are commonly used include the average nearest neighbor index [6, 15] and SatScan [4, 12, 19, 23]. Alternatively, one can look towards changes in descriptive statistics, such as how much the mean center of the point pattern has drifted due to masking [12, 13], or by simply measuring average distance that each point was displaced [5, 12, 19]. These are only some of the many ways to quantify information loss. Ultimately, it is best to look towards multiple complementary measures to assemble a more complete picture of what is lost to the masking process.

Existing tools

There exist a small number of tools that use geographic masks for the purpose of privacy protection, which are summarized in Table 1. In 2019 we developed MaskMy.XYZ, which is a web application for applying donut masking to sensitive locations within the browser [9]. While being very easy to use and providing rudimentary tools for measuring privacy and information loss, it is primarily designed for quickly masking small datasets rather than being a comprehensive masking application. Alternatively, GeoPriv is a QGIS plugin that comes closer to this goal, as it offers three separate masking techniques: spatial clustering, Laplacian noise, and the NRandK mask [16, 24]. GeoPriv is a plugin for QGIS, and has the advantage of being usable directly inside an existing and popular GIS environment.

Table 1 A summary table comparing some of the different tools available that allow users to geographically mask their data, or make use of geographic masks to achieve their desired purpose

Full size table

While MaskMy.XYZ and GeoPriv are explicitly for the purpose of geographic masking, other tools make use of geographic masking to achieve a larger goal. For instance, Privy.to is another web-based application that leverages geographic masks, temporal obfuscation, and encryption to allow users to privately and securely share their location with others [25]. MapSafe takes a similar approach by combining geographic masking, encryption, and blockchain technology to allow users to anonymize sensitive geospatial data, share it based on different levels of trust, and securely notarize it on the Ethereum blockchain without exposing either the masked or unmasked data [26].

However, when compared to the wealth of geographic masking techniques that have been developed in GIScience, there is a clear lack of GISystems tooling to translate these into reality. As Boeing [27] concisely writes, “to conduct better science, we need to build better tools”. Indeed, we are in an age where spatial data abounds, but often this data is highly sensitive and cannot be easily shared without sparking legitimate privacy concerns. Translating the wealth of GIScience theory about geoprivacy and geomasking into GISystems tooling is a key step towards unlocking this data and allowing other researchers to tap into its latent potential. Better tooling can also improve the science of geographic masking itself by reducing the evaluation overhead. These are the twin goals of MaskMyPy.

MaskMyPy: an overview

MaskMyPy was initially released in 2020 as an alpha stage software package, and included Street masking, Donut masking, as well as functions to calculate displacement distance and k-anonymity [10]. However, many lessons were learned during development about how best to implement these features in a clear and maintainable way. Moreover, the package was released as an alpha and as such uptake has remained limited. While we have had several users reach out with positive feedback, we wanted to improve the code-base to make it better to both use and contribute to, and in doing so increase uptake. This motivated us to rewrite the vast majority of the code-base to be more maintainable, readable, tested, and well packaged. From a user’s perspective, the interface was rewritten to be simpler, and many features were added to enhance the analysis aspect of geographic masking. This section outlines this new, enhanced version of the software.

MaskMyPy has three primary components: geographic masks, analysis functions, and the Atlas. This section briefly walks through using each of these components to mask a dataset and evaluate the level of privacy protection and information loss that is incurred. The goal is to demonstrate, through several short code snippets, the degree to which MaskMyPy can streamline geographic masking workflows.

Geographic masks

First, the included geographic masks are Python functions that take in a GeoDataFrame and various masking parameters, and return a copy of that GeoDataFrame with masked coordinates. A GeoDataFrame is a data structure that is provided by the highly popular Geopandas Python library, which provides a wide range of tools for working with spatial data. Four masks have been implemented: donut masking, location swapping, voronoi masking, and street masking. Figure 1 provides a brief explanation of how each of these masks displace sensitive points. Donut and voronoi masking provide the fastest execution times, but generally yield a worse trade-off between privacy protection and information loss than location swapping or street masking, which make use of supplemental data to better inform masking distances. As an example, the following code snippet performs donut masking, displacing points between 20 and 200 m, and producing a map (shown in Fig. 2) that visualizes how each point was displaced:

Analysis functions

As previously mentioned, it is important to evaluate how well this mask performed, particularly with regard to privacy protection. MaskMyPy’s analysis functions drastically simplify this process. Spatial k-anonymity can be calculated using either census data (e.g. polygons with associated population counts), or address points. Building on the previous example, we will now calculate the address-based k-anonymity of our masked GeoDataFrame and then use a function that calculates the percentage of points that achieved a k-anonymity above a threshold value (in this case, 5). This technique was first used by [6], which we term here as k-satisfaction:

As we can see, 80% of points have achieved a k-anonymity of at least 5. If we are satisfied with this result, we can proceed to measure information loss. Several metrics are available for this purpose, including: central drift, displacement distance, nearest-neighbor distance, and Ripley’s K function. A number of higher-level functions are also available which summarize the row-level outputs of some of these metrics, facilitating interpretation. Continuing with our example, we will measure and then summarize the displacement distance of each point:

The Atlas

While the above exemplifies some of the individual functions that MaskMyPy provides, the Atlas is a Python class that is designed to conveniently tie them together. Put briefly, it acts as a ‘mask manager’ that allows you to quickly test any number of combinations of masks and their associated parameters, automatically performing the evaluation for you and keeping track of the results. Essentially, it lets users perform a wide array of analyses in a very short amount of code, which we hope not only helps users to better validate their masks, but enables masking researchers to ask new questions.

What makes much of this possible is the Atlas.mask method, which takes in a simple masking function (including custom ones from outside of MaskMyPy) and any keyword arguments for that function. When the method is called, it will apply the masking function to the sensitive points, perform a range of analyses on the resulting masked GeoDataFrame, and then add a ‘candidate’ to a list. A candidate is a simple Python dictionary that contains various statistics describing the mask’s performance (see Table 2), as well as all the information required to regenerate an exact copy of the masked GeoDataFrame. The latter point is significant as it means that the masked GeoDataFrame itself can be discarded from memory after being evaluated, then recreated later if needed. This allows users to iterate over hundreds, or even thousands, of combinations of masks and parameters without running out of memory or disk space. The following code snippet shows how the Atlas can be used to apply donut masking to a set of sensitive points, and the resulting information that is generated:

Table 2 A list of the various statistics that can be automatically generated for each candidate

Full size table

Importantly, the Atlas also helps users find the candidates that meet their criteria. For instance, if the above Atlas contained many candidates, a user can remove any candidates with a minimum k-anonymity below 5, then sort by the median displacement distance using prune and sort methods. Finally, they can export a table (specifically, a Pandas DataFrame, shown in Table 3) of all the remaining candidates:

Table 3 A DataFrame generated by the Atlas, illustrating how candidates can be filtered, sorted, and exported based on various statistics. Note that the DataFrame has been significantly truncated for brevity

Full size table

We believe that this degree of ease-of-iteration not only enables users to better protect their data, but also allows scholars researching geographic masks to begin to ask new questions about the masks they are developing. The next section exemplifies this by using MaskMyPy to explore mask variability.

Methodology

Case study: how variable are geographic masks?

Studies that present new geographic masks often follow a general pattern: describe a newly developed mask, then compare it to other masks using a set of test data, a small number of parameters for each mask (e.g. varying the maximum displacement distance), which are each executed only once. However, randomization is a critical component of most masks, introducing the possibility that the mask performed well in that particular execution, but may not perform as well in other executions. Using MaskMyPy, particularly the Atlas, we present a short case study on mask consistency, and examine how randomization and dataset size can affect privacy protection.

Data

The primary dataset used for this case study is a set of synthetically generated points in Vancouver, Canada, which was originally created in [10]. Synthetic datasets are often used in geographic masking research due to obvious privacy risks of using actual sensitive data [6, 8, 28]. More specifically, this dataset includes 7206 points, with clusters deliberately distributed across areas of varying population density. Further details as to the exact generation process can be found in [10]. To test how the size of a dataset affects mask variability, we randomly sampled points from this dataset to create five smaller datasets of 50, 100, 500, 1000, and 5000 points. Address data is also required for both privacy calculations and the location swapping mask, and was obtained from [29].

Methods

Using MaskMyPy’s Atlas feature, donut masking, location swapping, and street masking were repeated 50 times for each of the five sets of sensitive points. Donut masking and location swapping used minimum and maximum displacement distances of 20 and 200 m, respectively. Street masking used a range of 2 to 9 nodes along the street network, which was selected to produce similar minimum and median displacement distances as donut masking.

To evaluate privacy protection, we selected the k-satisfaction metric described above, which the Atlas automatically calculates at three thresholds of k: 5, 25, and 50. To reiterate, if a dataset contains 100 masked points, and 93 of those points achieved a k-anonymity value greater than or equal to 25, then the k-satisfaction would be 93%. We then created box plots depicting the range of k-satisfaction results across the 50 iterations of each pairing of mask and dataset. This represents the variability of privacy protection provided by the mask.

Results

Results show that privacy protection can vary considerably across mask iterations at lower point counts, with higher counts producing more consistent results, as illustrated in Figs. 3, 4, 5 and detailed in Table 4. While this general trend is certainly expected, the range of k-satisfaction performance between different masks should not be overlooked when using fewer than 1000 points. With 500 points and a minimum k of 50, k-satisfaction varied 7–9%, and with 100 points this rose to approximately 20%.

Table 4 K-Satisfaction minimums, maximums, and ranges across 50 iterations for each combination of mask and test dataset size

Full size table

Put into context, this means that if two identical masks were tested against each other, one could perform 20% better than the other, despite the only difference being due to randomization. While 20% is at the extreme of this particular test, in practice it is likely that even an 7% difference in k-satisfaction performance could lead one to conclude that one mask is ‘superior’.

It must be noted that this case study was not designed to compare each of the three masks relative to one another, as only a single set of parameters was used for each mask; tuning these parameters would likely lead to very different results. Rather, this analysis is meant to illustrate how varied masks can perform due to the randomization that underpins many of them. Indeed, researchers comparing masks should consider the potential for such variation in their study design, whether this be by using larger dataset sizes or multiple iterations of each set of masking parameters. Given that masks may be used by end users on both small and large datasets, it is likely wise to use both approaches side by side to ensure that the mask is performant across a variety of data.

Discussion & conclusion

Geographic masking is a small but important niche within GIScience. Huge volumes of spatial data are kept behind closed doors in the interest of privacy, but geographic masks offer the potential to open some of this data up to health researchers. Simultaneously, sensitive health data that should be protected with geographic masks is often published for the world to see, a practice that occurs far too often and that may (and perhaps, should) erode public trust [1, 2]. Finally, there is data that is protected before being released, but is based on areal aggregation, which can limit the usefulness of the data by impeding cluster detection [3]. MaskMyPy seeks to tackle each of these cases by removing the friction of robust geographic masking, and by fortifying research in the space.

Of course, it is impossible to make masking entirely friction-less. Users still must understand some of the fundamental concepts that underlie geographic masks, such as a mask’s overall mechanism for displacing points, how k-anonymity relates to privacy, and how information loss may be measured and represented. Tooling alone cannot solve these issues. But just as a user need not be an expert in computational geometry to understand the value that a ‘union’ or ‘intersect’ tool in GIS software may provide them, they need not be experts in geographic masking to get value from masking tools. Furthermore, tools can themselves work to educate users of underlying concepts; there is a reason GIS courses start by having students experiment and play with data and tools before unpacking how those tools work. In this way, we hope that MaskMyPy provides tools for researchers to play with, learn from, and ultimately incorporate into their workflows such that privacy is better protected while rich geospatial datasets are simultaneously opened up to the community.

Moreover, implementation matters, and MaskMyPy’s analysis tools help more advanced users interrogate the masks they implement. This applies not just to novel masks, but existing masks as well, which often lack publicly accessible implementations that users can reference, if not use directly. Indeed, the smallest decisions made when codifying methods can, whether on their own or in aggregate, produce significantly different results. For instance, there are a number of ways one can perform a mask as simple as random perturbation. In one method, a buffer is created around a given sensitive point, then a new point is randomly sampled within the buffer, producing the masked point. However, this method is much slower than just randomly selecting a distance and direction, and translating the point’s geometry accordingly. While this may seem like a simple optimization, the buffer-based approach fundamentally distributes points differently, with displacement distances being biased more towards the outside of the buffer area (where there is more area). The translation approach, on the other hand, produces a far more uniform distribution of displacement distances. This is only one example of myriad decisions made when programming that rapidly branch out into often unexpected results.

However, compounding these issues is that modern software development often heavily relies on third party libraries, which themselves can suffer the same problems. Much like science more broadly, we build on the shoulders of giants, and in doing so inherit many of their idiosyncrasies, whether we know it or not. Of course, this applies to MaskMyPy itself, and anything built on top of it. Comprehensive software testing can help to mitigate this problem, but is not a silver bullet. Software testing is difficult; it can be hard to know what to test, never mind how to meaningfully test it. This is not to argue against testing, but rather to underscore the importance of open source software in academic research. In this case, MaskMyPy provides masks that can act as reference implementations, and that other researchers can contribute to, whether that be by adding new masks, improving tests, or finding bugs. As such, we would encourage other researchers to not just use MaskMyPy’s masks and analysis tools when developing new masks, but to contribute these masks back into MaskMyPy. Over time, we hope to build a library of masks that the geomasking community can leverage and build around. To join this effort, visit https://github.com/TheTinHat/MaskMyPy.

Availability of data and materials

The synthetic datasets analysed during this study are available at https://osf.io/6uqkv. MaskMyPy is available at https://github.com/TheTinHat/MaskMyPy.

References

Haley DF, Matthews SA, Cooper HLF, Haardörfer R, Adimora AA, Wingood GM, et al. Confidentiality considerations for use of social-spatial data on the social determinants of health: Sexual and reproductive health case study. Soc Sci Med. 2016;166:49–56.
Article PubMed PubMed Central Google Scholar
Kounadi O, Leitner M. Why does geoprivacy matter? The scientific publication of confidential data presented on maps. J Empir Res Hum Res Ethics. 2014;9:34–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/1556264614544103.
Article PubMed Google Scholar
Armstrong MP, Rushton G, Zimmerman DL. Geographically masking health data to preserve confidentiality. Stat Med. 1999;18:497–525.
Article CAS PubMed Google Scholar
Hampton KH, Fitch MK, Allshouse WB, Doherty IA, Gesink DC, Leone PA, et al. Mapping health data: improved privacy protection with donut method geomasking. Am J Epidemiol. 2010;172:1062–9.
Article PubMed PubMed Central Google Scholar
Kounadi O, Leitner M. Adaptive areal elimination (AAE): a transparent way of disclosing protected spatial datasets. Comput Environ Urban Syst. 2016;57:59–67.
Article Google Scholar
Zhang S, Freundschuh SM, Lenzer K, Zandbergen PA. The location swapping method for geomasking. Cartogr Geogr Inf Sci. 2017;44:22–34. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/15230406.2015.1095655.
Article Google Scholar
Seidl DE, Jankowski P, Clarke KC. Privacy and false identification risk in geomasking techniques. Geogr Anal. 2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/gean.12144.
Article Google Scholar
Allshouse WB, Fitch MK, Hampton KH, Gesink DC, Doherty IA, Leone PA, et al. Geomasking sensitive health data and privacy protection: an evaluation using an E911 database. Geocarto Int. 2010;25:443–52. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/10106049.2010.496496.
Article PubMed PubMed Central Google Scholar
Swanlund D, Schuurman N, Brussoni M. MaskMy.XYZ: an easy-to-use tool for protecting geoprivacy using geographic masks. Trans GIS. 2020;24:390–401. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tgis.12606.
Article Google Scholar
Swanlund D, Schuurman N, Zandbergen P, Brussoni M. Street masking: a network-based geographic mask for easily protecting geoprivacy. Int J Health Geogr. 2020. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12942-020-00219-z.
Article PubMed PubMed Central Google Scholar
Leitner M, Curtis A. Cartographic guidelines for geographically masking the locations of confidential point data. Cartogr Perspect. 2004;49:22–39.
Article Google Scholar
Richter W. The verified neighbor approach to geoprivacy: an improved method for geographic masking. J Eposure Sci Environ Epidemiol. 2018;28:109–18.
Article Google Scholar
Seidl DE, Paulus G, Jankowski P, Regenfelder M. Spatial obfuscation methods for privacy protection of household-level data. Appl Geogr. 2015;63:253–63.
Article Google Scholar
Clarke KC. A multiscale masking method for point geographic data. Int J Geogr Inf Sci. 2016;30:300–15. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/13658816.2015.1085540.
Article Google Scholar
Murad A, Hilton B, Horan T, Tangenberg J. Protecting patient geo-privacy via a triangular displacement geo-masking method. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on privacy in geographic information collection and analysis. New York, NY, USA: ACM; 2014. p. 6:1–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/2675682.2676399
Zurbarán M, Wightman P, Brovelli M, Oxoli D, Iliffe M, Jimeno M, et al. NRand-K: minimizing the impact of location obfuscation in spatial analysis. Trans GIS. 2018;22:1257–74. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tgis.12462.
Article Google Scholar
Polzin F, Kounadi O. Adaptive Voronoi masking: a method to protect confidential discrete spatial data. 2021. p. 17, 5287808 bytes. https://doiorg.publicaciones.saludcastillayleon.es/10.4230/LIPIcs.GIScience.2021.II.1
Sweeney L. K-anonymity: a model for protecting privacy. Internat J Uncertain Fuzziness Knowl Based Syst. 2002;10:557–70.
Article Google Scholar
Cassa CA, Grannis SJ, Overhage JM, Mandl KD. A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection. J Am Med Inform Assoc. 2006;13:160–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1197/jamia.M1920.
Article PubMed PubMed Central Google Scholar
Gruteser M, Grunwald D. Anonymous usage of location-based services through spatial and temporal cloaking. In: Proceedings of the 1st international conference on mobile systems, applications and services. New York, NY, USA: ACM; 2003. p. 31–42. https://doiorg.publicaciones.saludcastillayleon.es/10.1145/1066116.1189037
Zandbergen P. Ensuring confidentiality of geocoded health data: assessing geographic masking strategies for individual-level data. Adv Med. 2014;2014: 567049.
Article PubMed PubMed Central Google Scholar
Broen K, Trangucci R, Zelner J. Measuring the impact of spatial perturbations on the relationship between data privacy and validity of descriptive statistics. Int J Health Geogr. 2021;20:3. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12942-020-00256-8.
Article PubMed PubMed Central Google Scholar
Olson KL, Grannis SJ, Mandl KD. Privacy protection versus cluster detection in spatial epidemiology. Am J Public Health. 2006;96:2002–8.
Article PubMed PubMed Central Google Scholar
Duque JP, Pugliese A, Wightman P. GeoPriv; 2019. https://github.com/Diuke/GeoPriv
McKenzie G, Romm D, Zhang H, Brunila M. PrivyTo: a privacy-preserving location-sharing platform. Trans GIS. 2022;26:1703–17. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tgis.12924.
Article Google Scholar
Sharma P, Martin M, Swanlund D. MapSafe: A complete tool for achieving geospatial data sovereignty. Trans GIS. 2023;27:1680–98. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tgis.13094.
Article Google Scholar
Boeing G. The right tools for the job: the case for spatial science tool-building. Trans GIS. 2020;24:1299–314. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tgis.12678.
Article Google Scholar
Croft WL, Shi W, Sack J-R, Corriveau J-P. Location-based anonymization: comparison and evaluation of the Voronoi-based aggregation system. Int J Geogr Inf Sci. 2016;30:2253–75. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/13658816.2016.1172314.
Article Google Scholar
OpenAddresses.io. OpenAddresses. OpenAddresses; 2019. http://openaddresses.io/

Download references

Acknowledgements

Not applicable.

Funding

This research was funded by the Social Sciences and Humanities Research Council of Canada (SSHRC).

Author information

Authors and Affiliations

Department of Geography, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
David Swanlund & Nadine Schuurman

Authors

David Swanlund
View author publications
You can also search for this author inPubMed Google Scholar
Nadine Schuurman
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

DS conceptualized and coded the MaskMyPy package. DS performed the analysis in the case study of mask consistency. DS and NS wrote and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to David Swanlund.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Swanlund, D., Schuurman, N. MaskMyPy: python tools for performing and analyzing geographic masks. Int J Health Geogr 24, 12 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12942-025-00399-6

Download citation

Received: 14 July 2024
Accepted: 14 April 2025
Published: 09 May 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12942-025-00399-6

MaskMyPy: python tools for performing and analyzing geographic masks