Search

3 search hits

1 to 3

Sort by

A Probabilistic Morphology Model for German Lemmatization (2019)

Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form word the tagger tries to find the sequence of morphemes that is most likely to generate that word. From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word.

Verbal Idioms: Concrete Nouns in Abstract Contexts (2021)

Charbonnier, Jean ; Wartena, Christian

In this paper, we present our approach for the KONVENS 2021 shared task Disambiguation of German Verbal Idioms. Our model is a decision tree-based classifier that uses static word embeddings and computed concreteness values to predict whether a verbal idiom is used figuratively or literal.

TeCoPhy: A Text Corpus of German Physics Texts (2023)

Lecarda Fontanella, Vitor Lécio ; Bleckmann, Tom ; Dieckhoff, Lukas ; Friege, Gunnar ; Wartena, Christian

To learn a subject, the acquisition of the associated technical language is important. Despite this widely accepted importance of learning the technical language, hardly any studies are published that describe the characteristics of most technical languages that students are supposed to learn. This might largely be due to the absence of specialized text corpora to study such languages at lexical, syntactical and textual level. In the present paper we describe a corpus of German physics text that can be used to study the language used in physics. A large and a small variant are compiled. The small version of the corpus consists of 5.3 Million words and is available on request.

1 to 3

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

3 search hits