A Strategy for Anonymizing Free-Text Medical Reports Using LLM-Aix
- To develop a decision support system for pediatric cardiology case conferences, the anonymization of 4,000 freetext medical case reports is required. This paper presents an anonymization strategy using LLM-AIx, a tool for structured information extraction based on large language models (LLM). The three-step process involves automatic extraction of personally identifiable information (PII) from the reports, evaluation of the results against a manually annotated ground truth, and replacement of identified PII with surrogate values, including controlled date shifting. Initial tests with six example reports revealed challenges regarding handling multiple attribute occurrences and consistent replacements. Future work will focus on full pipeline implementation and mapping clinical information to standardized terminologies such as SNOMED CT.
| Author: | Darian LiehrORCiD, Theodor UdenORCiDGND, Christian WartenaORCiDGND, Volker AhlersORCiDGND, Steffen Oeltze-JafraORCiD, Michael MarschollekORCiDGND, Philipp BeerbaumORCiDGND, Oliver J. BottORCiDGND |
|---|---|
| URN: | urn:nbn:de:bsz:960-opus4-37819 |
| DOI: | https://doi.org/10.25968/opus-3781 |
| Parent Title (German): | KI-Forum 2025 : KI in Forschung und Lehre an Hochschulen |
| Publisher: | HsH Applied Academics |
| Place of publication: | Hannover |
| Editor: | Hanno Homann, Cedric Rohbani, Jens Christian Will |
| Document Type: | Conference Proceeding |
| Language: | English |
| Year of Completion: | 2025 |
| Publishing Institution: | Hochschule Hannover |
| Release Date: | 2025/12/10 |
| Tag: | Anonymization; Case Conference; Large Language Model; Pediatric Cardiology; Personally Identifiable Information |
| GND Keyword: | Großes SprachmodellGND; AnonymisierungGND; KardiologieGND |
| Page Number: | 3 |
| First Page: | 62 |
| Last Page: | 64 |
| Link to catalogue: | 1970959789 |
| Institutes: | Fakultät III - Medien, Information und Design |
| Fakultät IV - Wirtschaft und Informatik | |
| Sonstige Einrichtungen | |
| Data|H - Institute for Applied Data Science Hannover | |
| DDC classes: | 370 Erziehung, Schul- und Bildungswesen |
| 004 Informatik | |
| 610 Medizin und Gesundheit | |
| Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |






