Refine
Document Type
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2) (remove)
Keywords
- Sprachnorm (2) (remove)
Institute
In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.
Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness.
We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.