AI for Extracting Pre-Analytical Variability Data from Biomedical Literature: Feasibility and Validation
- Introduction: The quality and reproducibility of research results from biological samples are significantly influenced by the pre-analytical variability resulting from different conditions during sample collection, storage and processing. Although numerous studies have investigated their effects, standardized and structured reporting remains limited, hindering systematic evaluation. This study explores the potential of Large Language Models (LLMs) for the structured extraction of pre-analytical variability data from scientific literature. Methods: Using a standardized parameter catalog, various LLMs were evaluated with specially designed prompts. Results: Models such as GPT-4.5, o1, DeepSeek R1, and o3 mini high demonstrated promising performance in contextual understanding and structured output generation, particularly for CSV files. However, consistent semantic mapping of complex experimental conditions (e.g., storage time versus temperature) proved challenging. Conclusion: Targeted token reduction significantly improved extraction quality. Overall, the study shows that LLMs can serve as effective tools for supporting structured data extraction in biomedical contexts—though current limitations in reproducibility and contextual fidelity highlight the continued need for expert oversight.
| Author: | Vicky ScholzORCiD, Sven BichtemannORCiD, Oliver J. BottORCiDGND, Thomas IlligORCiD, Sara HaagORCiD |
|---|---|
| URN: | urn:nbn:de:bsz:960-opus4-38069 |
| DOI: | https://doi.org/10.25968/opus-3806 |
| DOI original: | https://doi.org/10.3233/SHTI251379 |
| ISBN: | 978-1-64368-615-8 |
| ISSN: | 1879-8365 |
| Parent Title (English): | German Medical Data Sciences 2025: GMDS Illuminates Health (Studies in Health Technology and Informatics ; 331) |
| Publisher: | IOS Press |
| Document Type: | Conference Proceeding |
| Language: | English |
| Year of Completion: | 2025 |
| Publishing Institution: | Hochschule Hannover |
| Release Date: | 2026/01/22 |
| Tag: | Artificial Intelligence (AI); Biological Specimen Banks; Large Language Models (LLM); Metabolomics; Pre-Analytical Phase |
| GND Keyword: | Künstliche IntelligenzGND; Großes SprachmodellGND; MetabolomikGND; BiobankGND |
| First Page: | 52 |
| Last Page: | 62 |
| Institutes: | Fakultät III - Medien, Information und Design |
| DDC classes: | 020 Bibliotheks- und Informationswissenschaft |
| 610 Medizin, Gesundheit | |
| Licence (German): | Creative Commons - CC BY-NC - Namensnennung - Nicht kommerziell 4.0 International |






