TY  - CHAP
U1  - Konferenzveröffentlichung
A1  - Wartena, Christian
T1  - A Probabilistic Morphology Model for German Lemmatization
T2  - Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019)
N2  - Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form  word the tagger tries to find the sequence of morphemes that is most likely to generate that word.  From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word.
KW  - Lemmatization
KW  - German
KW  - POS Tagging
KW  - Markov Models
KW  - Computerlinguistik
Y1  - 2019
UN  - https://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-15271
U6  - https://doi.org/10.25968/opus-1527
DO  - https://doi.org/10.25968/opus-1527
SP  - 40
EP  - 49
ER  -