Volltext-Downloads (blau) und Frontdoor-Views (grau)
(Leider keine statistischen Daten vom 26.05. – 18.06.2018)

A Probabilistic Morphology Model for German Lemmatization

  • Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form word the tagger tries to find the sequence of morphemes that is most likely to generate that word. From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word.

Download full text files

Export metadata

Statistics

frontdoor_oas
Metadaten
Author:Christian WartenaORCiDGND
URN:urn:nbn:de:bsz:960-opus4-15271
DOI:https://doi.org/10.25968/opus-1527
Parent Title (English):Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019)
Document Type:Conference Proceeding
Language:English
Year of Completion:2019
Publishing Institution:Hochschule Hannover
Release Date:2019/12/12
Tag:German; Lemmatization; Markov Models; POS Tagging
GND Keyword:Computerlinguistik
First Page:40
Last Page:49
Institutes:Fakultät III - Medien, Information und Design
DDC classes:020 Bibliotheks- und Informationswissenschaft
400 Sprache, Linguistik
Licence (German):License LogoCreative Commons - CC BY-NC-SA - Namensnennung - Nicht kommerziell - Weitergabe unter gleichen Bedingungen 4.0 International