References

An, Shengnan, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Weizhu Chen, and Jian-Guang Lou. 2023. “Skill-Based Few-Shot Selection for in-Context Learning.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 13472–92. Singapore: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.831.

Artstein, Ron, and Massimo Poesio. 2008. “Survey Article: Inter-Coder Agreement for Computational Linguistics.” Computational Linguistics 34 (4): 555–96. https://www.aclweb.org/anthology/J08-4004.

Bevendorff, Janek, Martin Potthast, Matthias Hagen, and Benno Stein. 2019. “Heuristic Authorship Obfuscation.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluı́s Màrquez, 1098–108. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1104.

Champion, Pierre. 2023. “Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques.” PhD thesis, Université de Lorraine.

Coulthard, Malcolm, Alison Johnson, and David Wright. 2016. An Introduction to Forensic Linguistics: Language in Evidence. Routledge.

Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. “Calibrating Noise to Sensitivity in Private Data Analysis.” In Theory of Cryptography, edited by Shai Halevi and Tal Rabin, 265–84. Berlin, Heidelberg: Springer Berlin Heidelberg. https://link.springer.com/chapter/10.1007/11681878_14.

Elazar, Yanai, and Yoav Goldberg. 2018. “Adversarial Removal of Demographic Attributes from Text Data.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, 11–21. Brussels, Belgium: Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1002.

Goldsteen, Abigail, Gilad Ezov, Ron Shmelkin, Micha Moffie, and Ariel Farkash. 2022. “Data Minimization for GDPR Compliance in Machine Learning Models.” AI and Ethics 2 (3): 477–91.

Igamberdiev, Timour, and Ivan Habernal. 2023. “DP-BART for Privatized Text Rewriting Under Local Differential Privacy.” In Findings of the Association for Computational Linguistics: ACL 2023, edited by Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, 13914–34. Toronto, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-acl.874.

Kim, Siwon, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon, and Seong Joon Oh. 2024. “Propile: Probing Privacy Leakage in Large Language Models.” Advances in Neural Information Processing Systems 36.

Manzanares-Salor, Benet, David Sánchez, and Pierre Lison. 2024. “Evaluating the Disclosure Risk of Anonymized Documents via a Machine Learning-Based Re-Identification Attack.” Data Mining and Knowledge Discovery, 1–36.

Meisenbacher, Stephen, and Florian Matthes. 2024. “Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text.” In Proceedings of the 19th International Conference on Availability, Reliability and Security. ARES ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3664476.3669926.

Miranda, Michele, Elena Sofia Ruzzetti, Andrea Santilli, Fabio Massimo Zanzotto, Sébastien Bratières, and Emanuele Rodolà. 2024. “Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions.” arXiv Preprint arXiv:2408.05212.

Olstad, Annika Willoch, Anthi Papadopoulou, and Pierre Lison. 2023. “Generation of Replacement Options in Text Sanitization.” In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), edited by Tanel Alumäe and Mark Fishel, 292–300. Tórshavn, Faroe Islands: University of Tartu Library. https://aclanthology.org/2023.nodalida-1.30.

Papadopoulou, Anthi, Pierre Lison, Mark Anderson, Lilja Øvrelid, and Ildikó Pilán. 2023. “Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis.” arXiv Preprint arXiv:2310.14312.

Pilán, Ildikó, Pierre Lison, Lilja Øvrelid, Anthi Papadopoulou, David Sánchez, and Montserrat Batet. 2022. “The Text Anonymization Benchmark (Tab): A Dedicated Corpus and Evaluation Framework for Text Anonymization.” Computational Linguistics 48 (4): 1053–1101.

Sánchez, David, and Montserrat Batet. 2016. “C-Sanitized: A Privacy Model for Document Redaction and Sanitization.” J. Assoc. Inf. Sci. Technol. 67 (1): 148–63. https://doi.org/10.1002/asi.23363.

Sousa, Samuel, and Roman Kern. 2023. “How to Keep Text Private? A Systematic Review of Deep Learning Methods for Privacy-Preserving Natural Language Processing.” Artificial Intelligence Review 56 (2): 1427–92.

Weitzenboeck, Emily M, Pierre Lison, Malgorzata Cyndecka, and Malcolm Langford. 2022. “The GDPR and unstructured data: is anonymization possible?” International Data Privacy Law 12 (3): 184–206. https://doi.org/10.1093/idpl/ipac008.

Xu, Qiongkai, Lizhen Qu, Chenchen Xu, and Ran Cui. 2019. “Privacy-Aware Text Rewriting.” In Proceedings of the 12th International Conference on Natural Language Generation, edited by Kees van Deemter, Chenghua Lin, and Hiroya Takamura, 247–57. Tokyo, Japan: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-8633.

Yang, Runqi, Jianhai Zhang, Xing Gao, Feng Ji, and Haiqing Chen. 2019. “Simple and Effective Text Matching with Richer Alignment Features.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluı́s Màrquez, 4699–709. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1465.