Volltext-Downloads (blau) und Frontdoor-Views (grau)

AI for Extracting Pre-Analytical Variability Data from Biomedical Literature: Feasibility and Validation

  • Introduction: The quality and reproducibility of research results from biological samples are significantly influenced by the pre-analytical variability resulting from different conditions during sample collection, storage and processing. Although numerous studies have investigated their effects, standardized and structured reporting remains limited, hindering systematic evaluation. This study explores the potential of Large Language Models (LLMs) for the structured extraction of pre-analytical variability data from scientific literature. Methods: Using a standardized parameter catalog, various LLMs were evaluated with specially designed prompts. Results: Models such as GPT-4.5, o1, DeepSeek R1, and o3 mini high demonstrated promising performance in contextual understanding and structured output generation, particularly for CSV files. However, consistent semantic mapping of complex experimental conditions (e.g., storage time versus temperature) proved challenging. Conclusion: Targeted token reduction significantly improved extraction quality. Overall, the study shows that LLMs can serve as effective tools for supporting structured data extraction in biomedical contexts—though current limitations in reproducibility and contextual fidelity highlight the continued need for expert oversight.

Download full text files

Export metadata

Additional Services

Search Google Scholar

Statistics

frontdoor_oas
Metadaten
Author:Vicky ScholzORCiD, Sven BichtemannORCiD, Oliver J. BottORCiDGND, Thomas IlligORCiD, Sara HaagORCiD
URN:urn:nbn:de:bsz:960-opus4-38069
DOI:https://doi.org/10.25968/opus-3806
DOI original:https://doi.org/10.3233/SHTI251379
ISBN:978-1-64368-615-8
ISSN:1879-8365
Parent Title (English):German Medical Data Sciences 2025: GMDS Illuminates Health (Studies in Health Technology and Informatics ; 331)
Publisher:IOS Press
Document Type:Conference Proceeding
Language:English
Year of Completion:2025
Publishing Institution:Hochschule Hannover
Release Date:2026/01/22
Tag:Artificial Intelligence (AI); Biological Specimen Banks; Large Language Models (LLM); Metabolomics; Pre-Analytical Phase
GND Keyword:Künstliche IntelligenzGND; Großes SprachmodellGND; MetabolomikGND; BiobankGND
First Page:52
Last Page:62
Institutes:Fakultät III - Medien, Information und Design
DDC classes:020 Bibliotheks- und Informationswissenschaft
610 Medizin, Gesundheit
Licence (German):License LogoCreative Commons - CC BY-NC - Namensnennung - Nicht kommerziell 4.0 International