Multi Attribute Stratified Sampling: An Automated Framework for Privacy-Preserving Healthcare Data Publishing with Multiple Sensitive Attributes

Vikas Thammanna Gowda; Landis Humphrey; Aiden Kadoch; YinBo Chen; Olivia Roberts

doi:10.25046/aj110106

Open AccessArticle

Multi Attribute Stratified Sampling: An Automated Framework for Privacy-Preserving Healthcare Data Publishing with Multiple Sensitive Attributes

Volume 11, Issue 1, Page No 51–68, 2026

Author’s Name: Vikas Thammanna Gowda^*, Landis Humphrey, Aiden Kadoch, YinBo Chen, Olivia Roberts

Division of Information Technology and Sciences, Champlain College, Burlington, 05401, USA

^*whom correspondence should be addressed. E-mail: vthammannagowda@champlain.edu

Adv. Sci. Technol. Eng. Syst. J. 11(1), 51–68 (2026); DOI: 10.25046/aj110106

Keywords: Healthcare data anonymization, k-anonymity, Privacy-utility tradeoff, Multiple sensitive attributes, Automated Parameter Optimization, Machine Learning Utility

Received: 31 December 2025, Revised: 26 January 2026, Accepted: 29 January 2026, Published Online: 9 February 2026

(This article belongs to the SP20 (Special Issue on Multidisciplinary Frontiers in Engineering, Computing and Applied Sciences 2026) & Section Theory & Methods in Computer Science (CTM))

Download Now!

123 Downloads

Export Citations

Abstract

The accumulation and analysis of large-scale patient data have led to breakthrough discoveries in potential flags for diseases based on pattern recognition, highlight medication efficacy, and local population health trends that would be impossible with traditional paper-based records. However, these benefits come with unique challenges posed by the application of data sharing for research and analysis, and mandatory requirements that require careful balance between privacy protection and usefulness of data especially when the data contains several sensitive information. We propose a framework, Multi Attribute Stratified Sampling (MASS), to achieve automatic parameter optimization by separating the sanitization process from manual privacy parameter configuration. Most traditional privacy-preserving techniques require experts to specify privacy parameters such as k, l, and t values for k-anonymity, l-diversity, and t-closeness respectively based on intuition or trail and error resulting in sub-optimal privacy-utility-tradeoffs. In contrast our framework employs a self-tuning paradigm which uses GetAnonymized, CandidateBuilder, and Optimizer modules. CandidateBuilder produces multiple anonymized versions of the original preprocessed data by iteratively calling GetAnonymized on a range of anonymization levels creating a solution space. The Optimizer then scans through the solution space using an objective function to determine the optimal anonymized version. The objective function utilizes privacy and information losses along with the classification recall to discover the privacy parameters that yield the best balance between privacy protection and data utility, eliminating the burden of manually fine tuning the privacy parameters and ensuring reproducible outcomes across various healthcare datasets and analytical contexts. Experimental validation on four datasets (1k, 10k, 100k and 10M) demonstrates that MASS achieves strong privacy protection across datasets with a privacy loss < 0.25 while maintaining >95% recall retention on datasets exceeding 10k records. Given that the computational complexity to generate anonymized dataset is NP-hard, MASS presents a polynomial-time heuristic solution which validates practical implementation and scalability for real-world deployment.

Full Text

References (22)

“Food and Drug Administration Amendments Act of 2007,” Pub. L. No. 110-85, § 801, 121 Stat. 823, 2007, codified at 42 U.S.C. § 282(j).
“Regulation (EU) No 536/2014 of the European Parliament and of the Council of 16 April 2014 on clinical trials on medicinal products for human use, and repealing Directive 2001/20/EC,” Official Journal of the European Union, 2014, available at: https://eur-lex.europa.eu/eli/reg/2014/536/oj.
L. Sweeney, “k-anonymity: a model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5), 557–570, 2002,
doi:10.1142/S0218488502001648.
P. Samarati, L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Technical Report SRI-CSL-98-04, SRI International, Menlo Park, CA, 1998.
A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, “Ldiversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, 1(1), 3–es, 2007, doi:10.1145/1217299.1217302.
N. Li, T. Li, S. Venkatasubramanian, “t-Closeness: Privacy Beyond k-Anonymity and l-Diversity,” in 2007 IEEE 23rd International Conference on Data Engineering, 106–115, 2007, doi:10.1109/ICDE.2007.367856.
H. Lee, S. Kim, J. Kim, Y. Chung, “Utility-preserving anonymization for health data publishing,” BMC Medical Informatics and Decision Making, 17, 2017, doi:10.1186/s12911-017-0499-0.
A. Majeed, S. O. Hwang, “Solving the Privacy-Equity Trade-off in Data Sharing By Using Homophily, Diversity, and t-Closeness Based Anonymity Algorithm,” IEEE Access, 12, 181953–181974, 2024, doi:10.1109/ACCESS.2024.10772434.
T. Li, N. Li, J. Zhang, I. Molloy, “Slicing: A New Approach for Privacy Preserving Data Publishing,” IEEE Transactions on Knowledge and Data Engineering, 24(3), 561–574, 2012, doi:10.1109/TKDE.2010.236.
B. Su, J. Huang, K. Miao, Z. Wang, X. Zhang, Y. Chen, “K-Anonymity Privacy Protection Algorithm for Multi-Dimensional Data against Skewness and Similarity Attacks,” Sensors, 23(3), 1554, 2023, doi:10.3390/s23031554.
Y. Wei, H. Y. Benson, M. Capan, “An Analytical Approach to Privacy and Performance Trade-Offs in Healthcare Data Sharing,” arXiv preprint arXiv:2508.18513, 2025, doi:10.48550/arXiv.2508.18513.
S. Li, H. Shen, Y. Sang, H. Tian, “An efficient method for privacy-preserving trajectory data publishing based on data partitioning,” The Journal of Supercomputing, 76(7), 5276–5300, 2020, doi:10.1007/s11227-019-02906-6.
O. Abul, F. Bonchi, M. Nanni, “Never walk alone: Uncertainty for anonymity in moving objects databases,” in 2008 IEEE 24th International Conference on Data Engineering, 376–385, 2008, doi:10.1109/ICDE.2008.4497446.
A. Aristodimou, A. Antoniades, C. S. Pattichis, “Privacy preserving data publishing of categorical data through k-anonymity and feature selection,” Healthcare Technology Letters, 3(1), 16–21, 2016, doi:10.1049/htl.2015.0050.
H. Wang, J. He, N. Zhu, “Improving Data Utilization of K-anonymity through Clustering Optimization,” in Transactions on Data Privacy, volume 15, 177–192, 2022, available at: https://www.tdp.cat/issues21/tdp.a441a21.pdf.
J. Jayaram, P. Manickam, “An efficient privacy-preserving data publishing in health care records with multiple sensitive attributes,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 623–629, IEEE, Coimbatore, India, 2021, doi:10.1109/ICICT50816.2021.9358639.
Y. Rubner, C. Tomasi, L. J. Guibas, “The earth mover’s distance as a metric for image retrieval,” International Journal of Computer Vision, 40(2), 99–121, 2000, doi:10.1023/A:1026543900054.
J. Cao, B. Carminati, E. Ferrari, K.-L. Tan, “CASTLE: Continuously Anonymizing Data Streams,” IEEE Transactions on Dependable and Secure Computing, 8(3), 337–352, 2011, doi:10.1109/TDSC.2009.47.
A. Meyerson, R. Williams, “On the complexity of optimal k-anonymity,” in Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’04, 223–228, ACM, Paris, France, 2004, doi:10.1145/1055558.1055591.
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, A. Zhu, “Approximation algorithms for k-anonymity,” in Journal of Privacy Technology, 20051120001, 2005.
R. Khan, X. Tao, A. Anjum, H. Sajjad, S. u. R. Malik, A. Khan, F. Amiri, “Wireless Communications and Mobile Computing,” in Privacy preserving for multiple sensitive attributes against fingerprint correlation attack satisfying c-diversity, volume 2020, 1–18, 2020, doi:10.1155/2020/8416823.
J. Jayapradha, M. Prakash, Y. Alotaibi, O. I. Khalaf, S. A. Alghamdi, “IEEE Access,” in Heap bucketization anonymity—An efficient privacy-preserving data publishing model for multiple sensitive attributes, volume 10, 28773–28791, 2022, doi:10.1109/ACCESS.2022.3158312.

Multi Attribute Stratified Sampling: An Automated Framework for Privacy-Preserving Healthcare Data Publishing with Multiple Sensitive Attributes

Multi Attribute Stratified Sampling: An Automated Framework for Privacy-Preserving Healthcare Data Publishing with Multiple Sensitive Attributes

Abstract

Full Text

References (22)

Cited By

Citations by Dimensions

Citations by PlumX

Google Scholar

Crossref Citations

Metrics

Related Articles