TY - THES A1 - Josi, Frieda T1 - Textbasierte Annotation von Abbildungen mit Kategorien von Wikimedia N2 - In der vorliegenden Masterarbeit geht es um die automatische Annotation von Bildern mithilfe der Kategoriesystematik der Wikipedia. Die Annotation soll anhand der Bildbeschriftungen und ihren Textreferenzen erfolgen. Hierbei wird für vorhandene Bilder eine passende Kategorie vorgeschlagen. Es handelt sich bei den Bildern um Abbildungen aus naturwissenschaftlichen Artikeln, die in Open Access Journals veröffentlicht wurden. Ziel der Arbeit ist es, ein konzeptionelles Verfahren zu erarbeiten, dieses anhand einer ausgewählten Anzahl von Bildern durchzuführen und zu evaluieren. Die Abbildungen sollen für weitere Forschungsarbeiten und für die Projekte der Wikimedia Foundation zur Verfügung stehen. Das Annotationsverfahren findet im Projekt NOA - Nachnutzung von Open Access Abbildungen Verwendung. N2 - This master thesis deals with the automatic annotation of images using the Wikipedia category system. The annotation is carried out using the image’s captions and their respective text references. A suitable category is suggested for existing images. The images are illustrations from scientific articles published in open access journals. The aim of the work is to develop a conceptual procedure and to carry out and evaluate it on the basis of a selected number of images. The images shall be available for further research and for projects of theWikimedia Foundation. The annotation method is used in the NOA project - reuse of open access media. KW - Annotation KW - Text Mining KW - Open Science Y1 - 2018 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:960-opus4-11949 ER - TY - CHAP A1 - Josi, Frieda A1 - Wartena, Christian ED - Cuzzocrea, Alfredo ED - Bonchi, Francesco ED - Gunopulos, Dimitris T1 - Structural Analysis of Contract Renewals T2 - Proceedings of the CIKM 2018 Workshops, Torino, Italy, October 22, 2018. N2 - In the present paper we sketch an automated procedure to compare different versions of a contract. The contract texts used for this purpose are structurally differently composed PDF files that are converted into structured XML files by identifying and classifying text boxes. A classifier trained on manually annotated contracts achieves an accuracy of 87% on this task. We align contract versions and classify aligned text fragments into different similarity classes that enhance the manual comparison of changes in document versions. The main challenges are to deal with OCR errors and different layout of identical or similar texts. We demonstrate the procedure using some freely available contracts from the City of Hamburg written in German. The methods, however, are language agnostic and can be applied to other contracts as well. KW - Structural Analysis KW - Contract Analysis KW - Vertrag KW - Vergleich KW - Fassung KW - PDF KW - XML Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:960-opus4-15139 UR - http://ceur-ws.org/Vol-2482/paper31.pdf SN - 1613-0073 ER - TY - CHAP A1 - Josi, Frieda A1 - Wartena, Christian A1 - Charbonnier, Jean T1 - Text-based annotation of scientific images using Wikimedia categories T2 - Elloumi M. et al. (eds): Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol. 903 N2 - The reuse of scientific raw data is a key demand of Open Science. In the project NOA we foster reuse of scientific images by collecting and uploading them to Wikimedia Commons. In this paper we present a text-based annotation method that proposes Wikipedia categories for open access images. The assigned categories can be used for image retrieval or to upload images to Wikimedia Commons. The annotation basically consists of two phases: extracting salient keywords and mapping these keywords to categories. The results are evaluated on a small record of open access images that were manually annotated. KW - Scientific image search KW - Text annotation KW - Wikipedia categories Y1 - 2018 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:960-opus4-12488 SN - 978-3-319-99132-0 SN - 978-3-319-99133-7 N1 - The final authenticated version is available online at https://doi.org/10.1007/978-3-319-99133-7_20 SP - 243 EP - 253 PB - Springer CY - Cham ER - TY - CHAP A1 - Josi, Frieda A1 - Wartena, Christian A1 - Heid, Ulrich T1 - Detecting Paraphrases of Standard Clause Titles in Insurance Contracts T2 - RELATIONS - Workshop on meaning relations between phrases and sentences (May 23, 2019, Gothenburg, Sweden) N2 - For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify used contract clauses. This paper investigates how the similarity between titles of model clauses and headings extracted from contracts can be computed, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two additional semantic similarity measures based on word embeddings using pre-trained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses. KW - Text Similarity KW - Paraphrase Similarity KW - Similarity Measures KW - Contract Analysis KW - Title Matching KW - Paraphrase KW - Vertragsklausel KW - Ähnlichkeit KW - Versicherungsvertrag Y1 - 2019 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bsz:960-opus4-13375 UR - https://www.aclweb.org/anthology/W19-0803 SN - 978-1-950737-22-2 SP - 23 EP - 33 ER -