Refine
Document Type
- Article (1)
- Bachelor Thesis (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- Optische Zeichenerkennung (2) (remove)
Toward a service-based workflow for automated information extraction from herbarium specimens
(2018)
Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.
Recent developments in the field of deep learning have shown promising advances for a wide range of historically difficult computer vision problems. Using advanced deep learning techniques, researchers manage to perform high-quality single-image super-resolution, i.e., increasing the resolution of a given image without major losses in image quality, usually encountered when using traditional approaches such as standard interpolation. This thesis examines the process of deep learning super-resolution using convolutional neural networks and investigates whether the same deep learning models can be used to increase OCR results for low-quality text images.