Structural Analysis of Contract Renewals
- In the present paper we sketch an automated procedure to compare different versions of a contract. The contract texts used for this purpose are structurally differently composed PDF files that are converted into structured XML files by identifying and classifying text boxes. A classifier trained on manually annotated contracts achieves an accuracy of 87% on this task. We align contract versions and classify aligned text fragments into different similarity classes that enhance the manual comparison of changes in document versions. The main challenges are to deal with OCR errors and different layout of identical or similar texts. We demonstrate the procedure using some freely available contracts from the City of Hamburg written in German. The methods, however, are language agnostic and can be applied to other contracts as well.
Author: | Frieda JosiORCiD, Christian WartenaORCiDGND |
---|---|
URN: | urn:nbn:de:bsz:960-opus4-15139 |
URL: | http://ceur-ws.org/Vol-2482/paper31.pdf |
DOI: | https://doi.org/10.25968/opus-1513 |
ISSN: | 1613-0073 |
Parent Title (English): | Proceedings of the CIKM 2018 Workshops, Torino, Italy, October 22, 2018. |
Editor: | Alfredo Cuzzocrea, Francesco Bonchi, Dimitris Gunopulos |
Document Type: | Conference Proceeding |
Language: | English |
Year of Completion: | 2019 |
Publishing Institution: | Hochschule Hannover |
Release Date: | 2019/11/20 |
Tag: | Contract Analysis; Structural Analysis |
GND Keyword: | Vertrag; Vergleich; Fassung; PDF <Dateiformat>; XML |
Link to catalogue: | 168818158X |
Institutes: | Fakultät III - Medien, Information und Design |
DDC classes: | 020 Bibliotheks- und Informationswissenschaft |
Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |