020 Bibliotheks- und Informationswissenschaften
Refine
Document Type
- Conference Proceeding (2)
- Part of a Book (1)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Annotation (1)
- Concreteness (1)
- Indexierung <Inhaltserschließung> (1)
- Information Retrieval (1)
- Inhaltserschließung (1)
- Konkretum <Linguistik> (1)
- Kulturerbe (1)
- Open Source (1)
- Qualitätssicherung (1)
- Semantics (1)
Institute
In this poster we present the ongoing development of an integrated free and open source toolchain for semantic annotation of digitised cultural heritage. The toolchain development involves the specification of a common data model that aims to increase interoperability across diverse datasets and to enable new collaborative research approaches.
Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness.
We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.