TY - CPAPER U1 - Konferenzveröffentlichung A1 - Josi, Frieda A1 - Wartena, Christian A1 - Heid, Ulrich ED - Barney Smith E.H., Elisa H. ED - Pal, Umapada T1 - Representing Standard Text Formulations as Directed Graphs T2 - Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science, vol. 12917 N2 - In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora. KW - Graph-based Text Representations KW - Legal Writings KW - Standardised formulation KW - Azyklischer gerichteter Graph KW - Sprachnorm KW - Sachtext KW - Rechtswissenschaften Y1 - 2021 UN - https://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-20735 SN - 0302-9743 SS - 0302-9743 SN - 978-3-030-86158-2 SB - 978-3-030-86158-2 U6 - https://doi.org/10.25968/opus-2073 DO - https://doi.org/10.25968/opus-2073 SP - 475 EP - 487 PB - Springer CY - Cham ER -