Refine
Year of publication
Document Type
- Conference Proceeding (29)
- Article (3)
- Report (3)
- Working Paper (2)
- Part of a Book (1)
- Preprint (1)
Has Fulltext
- yes (39)
Is part of the Bibliography
- no (39)
Keywords
- Semantik (5)
- Text Mining (5)
- Concreteness (4)
- Information Retrieval (4)
- Computerlinguistik (3)
- Distributional Semantics (3)
- German (3)
- Klassifikation (3)
- Machine Learning (3)
- Open Access (3)
- Automatische Klassifikation (2)
- Classification (2)
- Contract Analysis (2)
- Deutsch (2)
- Disambiguation (2)
- Informationsmanagement (2)
- Keyword Extraction (2)
- Konkretum <Linguistik> (2)
- Korpus <Linguistik> (2)
- Lemmatization (2)
- Maschinelles Lernen (2)
- Rechtswissenschaften (2)
- Sachtext (2)
- Sprachnorm (2)
- Thesaurus (2)
- Vergleich (2)
- Vertrag (2)
- Wikidata (2)
- Wikimedia Commons (2)
- Ähnlichkeit (2)
- Abbreviations (1)
- Abkürzung (1)
- Acronyms (1)
- Akronym (1)
- Algorithmus (1)
- Ambiguität (1)
- Automatische Identifikation (1)
- Automatische Lemmatisierung (1)
- Azyklischer gerichteter Graph (1)
- Benutzererlebnis (1)
- Bilderkennung (1)
- Bildersprache (1)
- Bildersuchmaschine (1)
- Clustering (1)
- Corpus construction (1)
- Deep Convolutional Networks (1)
- Dewey-Dezimalklassifikation (1)
- Disambiguierung (1)
- Distributionelle Semantik (1)
- Dokumentanalyse (1)
- Erschließung (1)
- Fassung (1)
- Feature and Text Extraction (1)
- Figurative Language (1)
- Formelhafte Textabschnitte (1)
- Graph-based Text Representations (1)
- Illustration (1)
- Image Recognition (1)
- Image Retrieval (1)
- Imagery (1)
- Indexierung <Inhaltserschließung> (1)
- Information Dissemination (1)
- Inhaltserschließung (1)
- Knowledge Maps (1)
- Krankenhaus (1)
- LCSH (1)
- LIG (1)
- Latent Semantic Analysis (1)
- Layout Detection (1)
- Legal Documents (1)
- Legal Writings (1)
- Legende <Bild> (1)
- Lexical Semantics (1)
- Library of Congress (1)
- Linear Indexed Grammars (1)
- Linguistics (1)
- Linguistische Informationswissenschaft (1)
- Markov Models (1)
- Medieninformatik (1)
- Medizinische Bibliothek (1)
- Morphemanalyse (1)
- Morphologie <Linguistik> (1)
- Morphology (1)
- Multimedia (1)
- Multimedia Information Retrieval (1)
- Multimedia Retrieval (1)
- Multimedien (1)
- Notation <Klassifikation> (1)
- Onomastik (1)
- Ortsnamen (1)
- PDF <Dateiformat> (1)
- PDF Document Analysis (1)
- POS Tagging (1)
- Paraphrase (1)
- Paraphrase Similarity (1)
- Part of Speech Tagging (1)
- Passage Retrieval (1)
- Phraseologie (1)
- Physics (1)
- Physik (1)
- Qualitätssicherung (1)
- Rechtsdokumente (1)
- Regional Development (1)
- Regional Innovation Systems (1)
- Regional Policy (1)
- Retrieval (1)
- Schlagwort (1)
- Schlagwortkatalog (1)
- Schlagwortnormdatei (1)
- Scientific Figures (1)
- Scientific image search (1)
- Segmentation (1)
- Segmentierung (1)
- Semantics (1)
- Similarity Measures (1)
- Speech Recognition (1)
- Spracherkennung (1)
- Standardised formulation (1)
- Standardisierung (1)
- Statistical Analysis (1)
- Statistical Methods (1)
- Statistische Analyse (1)
- Statistische Methoden (1)
- Structural Analysis (1)
- Synononym (1)
- Synonymie (1)
- Territorial Intelligence (1)
- Text Segmentation (1)
- Text Similarity (1)
- Text annotation (1)
- Textbooks (1)
- Title Matching (1)
- User Generated Content (1)
- Verbal Idioms (1)
- Versicherungsvertrag (1)
- Vertragsklausel (1)
- Video Segmentation (1)
- Wikipedia categories (1)
- Word Norms (1)
- Wort (1)
- XML (1)
- Zweiwortsatz (1)
- abstractness (1)
- concreteness (1)
- context vectors (1)
- distributional semantics (1)
- supervised machine learning (1)
- thesauri (1)
- word embedding space (1)
- Überwachtes Lernen (1)
Institute
The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, except for the one using singular value decomposition to reduce the dimensionality of the feature space, is determined to a large extent by the frequency of the words. In a binary classification task of pairs of synonyms and unrelated words we find that for all similarity measures the results can be improved when we correct for the frequency bias.
This paper describes the approach of the Hochschule Hannover to the SemEval 2013 Task Evaluating Phrasal Semantics. In order to compare a single word with a two word phrase we compute various distributional similarities, among which a new similarity measure, based on Jensen-Shannon Divergence with a correction for frequency effects. The classification is done by a support vector machine that uses all similarities as features. The approach turned out to be the most successful one in the task.
This paper presents a possibility to extend the formalism of linear indexed grammars. The extension is based on the use of tuples of pushdowns instead of one pushdown to store indices during a derivation. If a restriction on the accessibility of the pushdowns is used, it can be shown that the resulting formalisms give rise to a hierarchy of languages that is equivalent with a hierarchy defined by Weir. For this equivalence, that was already known for a slightly different formalism, this paper gives a new proof. Since all languages of Weir's hierarchy are known to be mildly context sensitive, the proposed extensions of LIGs become comparable with extensions of tree adjoining grammars and head grammars.
In this paper we investigate how concreteness and abstractness are represented in word embedding spaces. We use data for English and German, and show that concreteness and abstractness can be determined independently and turn out to be completely opposite directions in the embedding space. Various methods can be used to determine the direction of concreteness, always resulting in roughly the same vector. Though concreteness is a central aspect of the meaning of words and can be detected clearly in embedding spaces, it seems not as easy to subtract or add concreteness to words to obtain other words or word senses like e.g. can be done with a semantic property like gender.
This paper summarizes the results of a comprehensive statistical analysis on a corpus of open access articles and contained figures. It gives an insight into quantitative relationships between illustrations or types of illustrations, caption lengths, subjects, publishers, author affiliations, article citations and others.
Editorial for the 17th European Networked Knowledge Organization Systems Workshop (NKOS 2017)
(2017)
Knowledge Organization Systems (KOS), in the form of classification systems, thesauri, lexical databases, ontologies, and taxonomies, play a crucial role in digital information management and applications generally. Carrying semantics in a well-controlled and documented way, Knowledge Organization Systems serve a variety of important functions: tools for representation and indexing of information and documents, knowledge-based support to information searchers, semantic road maps to domains and disciplines, communication tool by providing conceptual framework, and conceptual basis for knowledge based systems, e.g. automated classification systems. New networked KOS (NKOS) services and applications are emerging, and we have reached a stage where many KOS standards exist and the integration of linked services is no longer just a future scenario. This editorial describes the workshop outline and overview of presented papers at the 17th European Networked Knowledge Organization Systems Workshop (NKOS 2017) which was held during the TPDL 2017 Conference in Thessaloniki, Greece.
Editorial for the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016)
(2016)
Knowledge Organization Systems (KOS), in the form of classification systems, thesauri, lexical databases, ontologies, and taxonomies, play a crucial role in digital information management and applications generally. Carrying semantics in a well-controlled and documented way, Knowledge Organisation Systems serve a variety of important functions: tools for representation and indexing of information and documents, knowledge-based support to information searchers, semantic road maps to domains and disciplines, communication tool by providing conceptual framework, and conceptual basis for knowledge based systems, e.g. automated classification systems. New networked KOS (NKOS) services and applications are emerging, and we have reached a stage where many KOS standards exist and the integration of linked services is no longer just a future scenario. This editorial describes the workshop outline and overview of presented papers at the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) in Hannover, Germany.
The amount of papers published yearly increases since decades. Libraries need to make these resources accessible and available with classification being an important aspect and part of this process. This paper analyzes prerequisites and possibilities of automatic classification of medical literature. We explain the selection, preprocessing and analysis of data consisting of catalogue datasets from the library of the Hanover Medical School, Lower Saxony, Germany. In the present study, 19,348 documents, represented by notations of library classification systems such as e.g. the Dewey Decimal Classification (DDC), were classified into 514 different classes from the National Library of Medicine (NLM) classification system. The algorithm used was k-nearest-neighbours (kNN). A correct classification rate of 55.7% could be achieved. To the best of our knowledge, this is not only the first research conducted towards the use of the NLM classification in automatic classification but also the first approach that exclusively considers already assigned notations from other
classification systems for this purpose.
To learn a subject, the acquisition of the associated technical language is important.
Despite this widely accepted importance of learning the technical language, hardly any studies are published that describe the characteristics of most technical languages that students are supposed to learn. This might largely be due to the absence of specialized text corpora to study such languages at lexical, syntactical and textual level. In the present paper we describe a corpus of German physics text that can be used to study the language used in physics. A large and a small variant are compiled. The small version of the corpus consists of 5.3 Million words and is available on request.