Refine
Year of publication
Document Type
- Conference Proceeding (52) (remove)
Has Fulltext
- yes (52)
Is part of the Bibliography
- no (52)
Keywords
- Text Mining (5)
- Concreteness (4)
- Semantik (4)
- Ausbildung (3)
- Bibliothek (3)
- German (3)
- Information Retrieval (3)
- Informationsmanagement (3)
- Klassifikation (3)
- Bibliothekswesen (2)
Institute
- Fakultät III - Medien, Information und Design (52) (remove)
Scientific papers from all disciplines contain many abbreviations and acronyms. In many cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each acronym and thus can be applied to a large number of different acronyms with only few instances. We constructed a set of 19,954 examples of 4,365 ambiguous acronyms from image captions in scientific papers along with their contextually correct definition from different domains. We learn word embeddings for all words in the corpus and compare the averaged context vector of the words in the expansion of an acronym with the weighted average vector of the words in the context of the acronym. We show that this method clearly outperforms (classical) cosine similarity. Furthermore, we show that word embeddings learned from a 1 billion word corpus of scientific exts outperform word embeddings learned from much larger general corpora.
Concreteness of words has been studied extensively in psycholinguistic literature. A number of datasets have been created with average values for perceived concreteness of words. We show that we can train a regression model on these data, using word embeddings and morphological features, that can predict these concreteness values with high accuracy. We evaluate the model on 7 publicly available datasets. Only for a few small subsets of these datasets prediction of concreteness values are found in the literature. Our results clearly outperform the reported results for these datasets.
Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness.
We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.
Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.
Data and Information Science: Book of Abstracts at BOBCATSSS 2022 Hybrid Conference, 23rd - 25th of May 2022, Debrecen.
This year marks the 30th anniversary of the BOBCATSSS. The BOBCATSSS is an international, annual symposium designed for librarians and information professionals in a rapidly changing environment. Over the past 30 years, the conference has included exciting topics, great venues, interested guests and engaging presenters.
This year we would like to introduce the topics of the many papers presented in the Book of Abstracts for the first time in presence at the University of Debrecen and hybrid. The Book of Abstracts provides an overview of all presentations given at BOBCATSSS. Presentations are listed in alphabetical order by title and include speeches, Pecha Kuchas, posters and workshops.
The theme of BOBCATSSS is Data and Information Science. Data and information are the basis for decisions and processes in business, politics and science. Particularly important in the current era of digital transformation. This is exactly where this year's subthemes come in. They deal with data science, openness as well as institutional roles.
The Logical Observation Identifiers, Names and Codes (LOINC) is a common terminology used for standardizing laboratory terms. Within the consortium of the HiGHmed project, LOINC is one of the central terminologies used for health data sharing across all university sites. Therefore, linking the LOINC codes to the site-specific tests and measures is one crucial step to reach this goal. In this work we report our ongoing efforts in implementing LOINC to our laboratory information system and research infrastructure, as well as our challenges and the lessons learned. 407 local terms could be mapped to 376 LOINC codes of which 209 are already available to routine laboratory data. In our experience, mapping of local terms to LOINC is a widely manual and time consuming process for reasons of language and expert knowledge of local laboratory procedures.
„Grappa“ ist eine Middleware, die auf die Anbindung verschiedener Autobewerter an verschiedene E-Learning-Frontends respektive Lernmanagementsysteme (LMS) spezialisiert ist. Ein Prototyp befindet sich seit mehreren Semestern an der Hochschule Hannover mit dem LMS „moodle“ und dem Backend „aSQLg“ im Einsatz und wird regelmäßig evaluiert. Dieser Beitrag stellt den aktuellen Entwicklungsstand von Grappa nach diversen Neu- und Weiterentwicklungen vor. Nach einem Bericht über zuletzt gesammelte Erfahrungen mit der genannten Kombination von Systemen stellen wir wesentliche Neuerungen der moodle-Plugins, welche der Steuerung von Grappa aus moodle heraus dienen, vor. Anschließend stellen wir eine Erweiterung der bisherigen Architektur in Form eines neuentwickelten Grappa-php-Clients zur effizienteren Anbindung von LMS vor. Weiterhin berichten wir über die Anbindung eines weiteren Autobewerters „Graja“ für Programmieraufgaben in Java. Der Bericht zeigt, dass bereits wichtige Schritte für eine einheitliche Darstellung automatisierter Programmbewertung in LMS mit unterschiedlichen Autobewertern für die Studierenden absolviert sind. Die praktischen Erfahrungen zeigen aber auch, dass sowohl bei jeder der Systemkomponenten individuell, wie auch in deren Zusammenspiel via Grappa noch weitere Entwicklungsarbeiten erforderlich sind, um die Akzeptanz und Nutzung bei Studierenden sowie Lehrenden weiter zu steigern.
Regional knowledge map is a tool recently demanded by some actors in an institutional level to help regional policy and innovation in a territory. Besides, knowledge maps facilitate the interaction between the actors of a territory and the collective learning. This paper reports the work in progress of a research project which objective is to define a methodology to efficiently design territorial knowledge maps, by extracting information of big volumes of data contained in diverse sources of information related to a region. Knowledge maps facilitate management of the intellectual capital in organisations. This paper investigates the value to apply this tool to a territorial region to manage the structures, infrastructures and the resources to enable regional innovation and regional development. Their design involves the identification of information sources that are required to find which knowledge is located in a territory, which actors are involved in innovation, and which is the context to develop this innovation (structures, infrastructures, resources and social capital). This paper summarizes the theoretical background and framework for the design of a methodology for the construction of knowledge maps, and gives an overview of the main challenges for the design of regional knowledge maps.
Das ProFormA-Aufgabenformat wurde eingeführt, um den Austausch von Programmieraufgaben zwischen beliebigen Autobewertern (Grader) zu ermöglichen. Ein Autobewerter führt im ProFormA-Aufgabenformat spezifizierte „Tests“ sequentiell aus, um ein vom Studierenden eingereichtes Programm zu prüfen. Für die Strukturierung und Darstellung der Testergebnisse existiert derzeit kein graderübergreifender Standard. Wir schlagen eine Erweiterung des ProFormA-Aufgabenformats um eine Hierarchie von Bewertungsaspekten vor, die nach didaktischen Aspekten gruppiert ist und entsprechende Testausführungen referenziert. Die Erweiterung wurde in Graja umgesetzt, einem Autobewerter für Java-Programme. Je nach gewünschter Detaillierung der Bewertungsaspekte sind Testausführungen in Teilausführungen aufzubrechen. Wir illustrieren unseren Vorschlag mit den Testwerkzeugen Compiler, dynamischer Softwaretest, statische Analyse sowie unter Einsatz menschlicher Bewerter.