Refine
Year of publication
Document Type
- Conference Proceeding (52) (remove)
Has Fulltext
- yes (52)
Is part of the Bibliography
- no (52)
Keywords
- Text Mining (5)
- Concreteness (4)
- Semantik (4)
- Ausbildung (3)
- Bibliothek (3)
- German (3)
- Information Retrieval (3)
- Informationsmanagement (3)
- Klassifikation (3)
- Bibliothekswesen (2)
- Contract Analysis (2)
- Deutsch (2)
- Digitalisierung (2)
- Disambiguation (2)
- Distributional Semantics (2)
- E-Learning (2)
- Grader (2)
- Graja (2)
- Konkretum <Linguistik> (2)
- Kulturerbe (2)
- Machine Learning (2)
- Modellversuch BID (2)
- Open Access (2)
- Programmieraufgabe (2)
- Rechtswissenschaften (2)
- Sachtext (2)
- Sprachnorm (2)
- Vergleich (2)
- Vertrag (2)
- Wikibase (2)
- Wikidata (2)
- Ähnlichkeit (2)
- 3D data (1)
- Abbreviations (1)
- Abkürzung (1)
- Acronyms (1)
- Akronym (1)
- Algorithmus (1)
- Ambiguität (1)
- Annotation (1)
- Autobewerter (1)
- Automatische Klassifikation (1)
- Automatische Sprachanalyse (1)
- Automatisierte Programmbewertung (1)
- Azyklischer gerichteter Graph (1)
- Benutzererlebnis (1)
- Bewertungsaspekt (1)
- Bewertungsmaßstab (1)
- Bibliothekar (1)
- Bilderkennung (1)
- Bildersprache (1)
- Bildersuchmaschine (1)
- Bildmaterial (1)
- Bildverarbeitung (1)
- Book of Abstract (1)
- COVID-19 (1)
- Citizens (1)
- Classification (1)
- Codierung (1)
- Computerlinguistik (1)
- Constructive Alignment (1)
- Corpus construction (1)
- Data Science (1)
- Data Sharing (1)
- Data-Warehouse-Konzept (1)
- Datenaufbereitung (1)
- Decision Support Systems, Clinical (1)
- Deep Convolutional Networks (1)
- Dewey-Dezimalklassifikation (1)
- Didactic (1)
- Digital Wellbeing (1)
- Digitalization (1)
- Digitization (1)
- Disambiguierung (1)
- Dokumentanalyse (1)
- E - Assessment (1)
- FHIR (1)
- Fachsprache (1)
- Fassung (1)
- Feature and Text Extraction (1)
- Figurative Language (1)
- Focus Group (1)
- Formelhafte Textabschnitte (1)
- Forschungsdaten (1)
- GECCO: German Corona Consensus Data Set (1)
- Gesundheitsfürsorge (1)
- Gesundheitsinformationssystem (1)
- Graph-based Text Representations (1)
- Grappa (1)
- Gruppeninterview (1)
- Health IT (1)
- Health Information Interoperability (1)
- Hochschule (1)
- Home Care (1)
- Hybrid Conference (1)
- Image Recognition (1)
- Image Retrieval (1)
- Imagery (1)
- Images (1)
- Information Dissemination (1)
- Information Extraction (1)
- Information Management (1)
- Information Science (1)
- Interoperabilität (1)
- Java <Programmiersprache> (1)
- Keyword Extraction (1)
- Knowledge Maps (1)
- Kompakkt (1)
- Kompetenz (1)
- Korpus <Linguistik> (1)
- Krankenhaus (1)
- Krankenunterlagen (1)
- LIG (1)
- LOINC (1)
- Latent Semantic Analysis (1)
- Layout Detection (1)
- Legal Documents (1)
- Legal Writings (1)
- Legende <Bild> (1)
- Lemmatization (1)
- Lernmotivation (1)
- Lexical Semantics (1)
- Linear Indexed Grammars (1)
- Linked Data (1)
- Linked Open Data (1)
- Liver Transplantation (1)
- Markov Models (1)
- Maschinelles Lernen (1)
- Media Didactic Concept (1)
- Medical Coding (1)
- Mediendidaktik (1)
- Medizin (1)
- Medizinische Bibliothek (1)
- Middleware (1)
- Motivation (1)
- NFDI (1)
- NFDI4Culture – Konsortium für Forschungsdaten materieller und immaterieller Kulturgüter (1)
- NLP (1)
- Nierentransplantation (1)
- Notation <Klassifikation> (1)
- Open Repositories (1)
- Open Science (1)
- Open Source (1)
- OpenRefine (1)
- PDF <Dateiformat> (1)
- PDF Document Analysis (1)
- POS Tagging (1)
- Paraphrase (1)
- Paraphrase Similarity (1)
- Patient empowerment (1)
- Phraseologie (1)
- Physics (1)
- Physik (1)
- Plugin (1)
- ProFormA-Aufgabenformat (1)
- Qualifikation (1)
- Quality Control (1)
- Qualitätskontrolle (1)
- Rechtsdokumente (1)
- Reduction of Complexity (1)
- Regional Development (1)
- Regional Innovation Systems (1)
- Regional Policy (1)
- Repository <Informatik> (1)
- Schlagwortkatalog (1)
- Schlagwortnormdatei (1)
- Scientific image search (1)
- Selbstgesteuertes Lernen (1)
- Self-directed Learning (1)
- Semantics (1)
- Semantisches Datenmodell (1)
- Similarity Measures (1)
- Spezialbibliothekar (1)
- Standardised formulation (1)
- Standardisierung (1)
- Statistical Methods (1)
- Statistische Methoden (1)
- Structural Analysis (1)
- Systems Librarian, Data Librarian, Job advertisement analysis, Job profiles, New competencies (1)
- Terminologie (1)
- Terminology (1)
- Territorial Intelligence (1)
- Text Similarity (1)
- Text annotation (1)
- Textbooks (1)
- Thesaurus (1)
- Title Matching (1)
- Transplantatabstoßung (1)
- Verbal Idioms (1)
- Versicherungsvertrag (1)
- Vertragsklausel (1)
- Wikimedia Commons (1)
- Wikipedia categories (1)
- Wissenschaftliche Bibliothek (1)
- Word Norms (1)
- Wort (1)
- XML (1)
- Zweiwortsatz (1)
- abstractness (1)
- concreteness (1)
- context vectors (1)
- cultural heritage (1)
- data warehouse (1)
- distributional semantics (1)
- e-Assessment (1)
- eLearning (1)
- education (1)
- fall prediction (1)
- fall prevention (1)
- fall risk (1)
- graft rejection (1)
- high-quality Learning Formats (1)
- image processing (1)
- information extraction (1)
- interoperability (1)
- kidney transplant (1)
- library and information science (1)
- linked data (1)
- openEHR (1)
- research data management (1)
- research information (1)
- sensor-based assessment (1)
- supervised machine learning (1)
- thesauri (1)
- wearable sensors (1)
- web crawling (1)
- word embedding space (1)
- Öffentliche Bibliothek (1)
- Überwachtes Lernen (1)
Institute
- Fakultät III - Medien, Information und Design (52) (remove)
Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.
Generalisierte Rechtsdokumente, bei denen für die individuellen Ausprägungen eines Vertrages die Positionen im Text bekannt sind, können eingesetzt werden, um erstens das Genehmigungsverfahren von Neuverträgen automatisiert zu unterstützen und zweitens als Vertragsgenerator neue Rechtsdokumente vorausgewählt zur Verfügung zu stellen. In diesem Beitrag wird, mithilfe von bekannten juristischen Texten gezeigt, wie formelhafte Textabschnitte identifiziert und häufige individuelle Ausprägungen klassifiziert werden können, um als Musterabschnitte eingesetzt zu werden. Es werden Einsatzbereiche vorgestellt und vorhandenes Potential für Legal Tech-Anwendungen aufgezeigt.
Data and Information Science: Book of Abstracts at BOBCATSSS 2022 Hybrid Conference, 23rd - 25th of May 2022, Debrecen.
This year marks the 30th anniversary of the BOBCATSSS. The BOBCATSSS is an international, annual symposium designed for librarians and information professionals in a rapidly changing environment. Over the past 30 years, the conference has included exciting topics, great venues, interested guests and engaging presenters.
This year we would like to introduce the topics of the many papers presented in the Book of Abstracts for the first time in presence at the University of Debrecen and hybrid. The Book of Abstracts provides an overview of all presentations given at BOBCATSSS. Presentations are listed in alphabetical order by title and include speeches, Pecha Kuchas, posters and workshops.
The theme of BOBCATSSS is Data and Information Science. Data and information are the basis for decisions and processes in business, politics and science. Particularly important in the current era of digital transformation. This is exactly where this year's subthemes come in. They deal with data science, openness as well as institutional roles.
In this paper we investigate how concreteness and abstractness are represented in word embedding spaces. We use data for English and German, and show that concreteness and abstractness can be determined independently and turn out to be completely opposite directions in the embedding space. Various methods can be used to determine the direction of concreteness, always resulting in roughly the same vector. Though concreteness is a central aspect of the meaning of words and can be detected clearly in embedding spaces, it seems not as easy to subtract or add concreteness to words to obtain other words or word senses like e.g. can be done with a semantic property like gender.
Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions.
Both the corpus and all manual annotations are made freely available. The method is language agnostic.
This paper describes the approach of the Hochschule Hannover to the SemEval 2013 Task Evaluating Phrasal Semantics. In order to compare a single word with a two word phrase we compute various distributional similarities, among which a new similarity measure, based on Jensen-Shannon Divergence with a correction for frequency effects. The classification is done by a support vector machine that uses all similarities as features. The approach turned out to be the most successful one in the task.
This paper presents a possibility to extend the formalism of linear indexed grammars. The extension is based on the use of tuples of pushdowns instead of one pushdown to store indices during a derivation. If a restriction on the accessibility of the pushdowns is used, it can be shown that the resulting formalisms give rise to a hierarchy of languages that is equivalent with a hierarchy defined by Weir. For this equivalence, that was already known for a slightly different formalism, this paper gives a new proof. Since all languages of Weir's hierarchy are known to be mildly context sensitive, the proposed extensions of LIGs become comparable with extensions of tree adjoining grammars and head grammars.
We present a simple method to find topics in user reviews that accompany ratings for products or services. Standard topic analysis will perform sub-optimal on such data since the word distributions in the documents are not only determined by the topics but by the sentiment as well. We reduce the influence of the sentiment on the topic selection by adding two explicit topics, representing positive and negative sentiment. We evaluate the proposed method on a set of over 15,000 hospital reviews. We show that the proposed method, Latent Semantic Analysis with explicit word features, finds topics with a much smaller bias for sentiments than other similar methods.
Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness.
We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.