Search

Using Word Embeddings for Unsupervised Acronym Disambiguation (2018)

Scientific papers from all disciplines contain many abbreviations and acronyms. In many cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each acronym and thus can be applied to a large number of different acronyms with only few instances. We constructed a set of 19,954 examples of 4,365 ambiguous acronyms from image captions in scientific papers along with their contextually correct definition from different domains. We learn word embeddings for all words in the corpus and compare the averaged context vector of the words in the expansion of an acronym with the weighted average vector of the words in the context of the acronym. We show that this method clearly outperforms (classical) cosine similarity. Furthermore, we show that word embeddings learned from a 1 billion word corpus of scientific exts outperform word embeddings learned from much larger general corpora.

Predicting Word Concreteness and Imagery (2019)

Charbonnier, Jean ; Wartena, Christian

Concreteness of words has been studied extensively in psycholinguistic literature. A number of datasets have been created with average values for perceived concreteness of words. We show that we can train a regression model on these data, using word embeddings and morphological features, that can predict these concreteness values with high accuracy. We evaluate the model on 7 publicly available datasets. Only for a few small subsets of these datasets prediction of concreteness values are found in the literature. Our results clearly outperform the reported results for these datasets.

Predicting the Concreteness of German Words (2020)

Charbonnier, Jean ; Wartena, Christian

Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness. We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.

Verbal Idioms: Concrete Nouns in Abstract Contexts (2021)

Charbonnier, Jean ; Wartena, Christian

In this paper, we present our approach for the KONVENS 2021 shared task Disambiguation of German Verbal Idioms. Our model is a decision tree-based classifier that uses static word embeddings and computed concreteness values to predict whether a verbal idiom is used figuratively or literal.

Predicting Visible Terms from Image Captions using Concreteness and Distributional Semantics (2022)

Charbonnier, Jean ; Wartena, Christian

Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.

30th BOBCATSSS Symposium - Book of Abstracts (2022)

Dille, Nils ; Stegemeyer, Merle ; Janus, Leandra ; Witten, Marna ; Menzel, Marie-Antoinette ; Arnold, Michelle ; Eichhorn, Karin

Data and Information Science: Book of Abstracts at BOBCATSSS 2022 Hybrid Conference, 23rd - 25th of May 2022, Debrecen. This year marks the 30th anniversary of the BOBCATSSS. The BOBCATSSS is an international, annual symposium designed for librarians and information professionals in a rapidly changing environment. Over the past 30 years, the conference has included exciting topics, great venues, interested guests and engaging presenters. This year we would like to introduce the topics of the many papers presented in the Book of Abstracts for the first time in presence at the University of Debrecen and hybrid. The Book of Abstracts provides an overview of all presentations given at BOBCATSSS. Presentations are listed in alphabetical order by title and include speeches, Pecha Kuchas, posters and workshops. The theme of BOBCATSSS is Data and Information Science. Data and information are the basis for decisions and processes in business, politics and science. Particularly important in the current era of digital transformation. This is exactly where this year's subthemes come in. They deal with data science, openness as well as institutional roles.

Implementing LOINC – Current Status and Ongoing Work at a Medical University (2019)

Fiebeck, Johanna ; Gietzelt, Matthias ; Ballout, Sarah ; Christmann, Martin ; Fradziak, Maikel ; Laser, Hans ; Ruppel, Julia ; Schönfeld, Norman ; Teppner, Sonja ; Gerbel, Svetlana

The Logical Observation Identifiers, Names and Codes (LOINC) is a common terminology used for standardizing laboratory terms. Within the consortium of the HiGHmed project, LOINC is one of the central terminologies used for health data sharing across all university sites. Therefore, linking the LOINC codes to the site-specific tests and measures is one crucial step to reach this goal. In this work we report our ongoing efforts in implementing LOINC to our laboratory information system and research infrastructure, as well as our challenges and the lessons learned. 407 local terms could be mapped to 376 LOINC codes of which 209 are already available to routine laboratory data. In our experience, mapping of local terms to LOINC is a widely manual and time consuming process for reasons of language and expert knowledge of local laboratory procedures.

Grading mit Grappa - Ein Werkstattbericht (2015)

Fricke, Peter ; Garmann, Robert ; Heine, Felix ; Kleiner, Carsten ; Reiser, Paul ; De Vere Peratoner, Immanuel ; Grzanna, Sören ; Wübbelt, Peter ; Bott, Oliver J.

„Grappa“ ist eine Middleware, die auf die Anbindung verschiedener Autobewerter an verschiedene E-Learning-Frontends respektive Lernmanagementsysteme (LMS) spezialisiert ist. Ein Prototyp befindet sich seit mehreren Semestern an der Hochschule Hannover mit dem LMS „moodle“ und dem Backend „aSQLg“ im Einsatz und wird regelmäßig evaluiert. Dieser Beitrag stellt den aktuellen Entwicklungsstand von Grappa nach diversen Neu- und Weiterentwicklungen vor. Nach einem Bericht über zuletzt gesammelte Erfahrungen mit der genannten Kombination von Systemen stellen wir wesentliche Neuerungen der moodle-Plugins, welche der Steuerung von Grappa aus moodle heraus dienen, vor. Anschließend stellen wir eine Erweiterung der bisherigen Architektur in Form eines neuentwickelten Grappa-php-Clients zur effizienteren Anbindung von LMS vor. Weiterhin berichten wir über die Anbindung eines weiteren Autobewerters „Graja“ für Programmieraufgaben in Java. Der Bericht zeigt, dass bereits wichtige Schritte für eine einheitliche Darstellung automatisierter Programmbewertung in LMS mit unterschiedlichen Autobewertern für die Studierenden absolviert sind. Die praktischen Erfahrungen zeigen aber auch, dass sowohl bei jeder der Systemkomponenten individuell, wie auch in deren Zusammenspiel via Grappa noch weitere Entwicklungsarbeiten erforderlich sind, um die Akzeptanz und Nutzung bei Studierenden sowie Lehrenden weiter zu steigern.

Regional Knowledge Maps - Potential and Challenges (2013)

Garcia-Alsina, Montserrat ; Wartena, Christian ; Lieberam-Schmidt, Sönke

Regional knowledge map is a tool recently demanded by some actors in an institutional level to help regional policy and innovation in a territory. Besides, knowledge maps facilitate the interaction between the actors of a territory and the collective learning. This paper reports the work in progress of a research project which objective is to define a methodology to efficiently design territorial knowledge maps, by extracting information of big volumes of data contained in diverse sources of information related to a region. Knowledge maps facilitate management of the intellectual capital in organisations. This paper investigates the value to apply this tool to a territorial region to manage the structures, infrastructures and the resources to enable regional innovation and regional development. Their design involves the identification of information sources that are required to find which knowledge is located in a territory, which actors are involved in innovation, and which is the context to develop this innovation (structures, infrastructures, resources and social capital). This paper summarizes the theoretical background and framework for the design of a methodology for the construction of knowledge maps, and gives an overview of the main challenges for the design of regional knowledge maps.

Bewertungsaspekte und Tests in Java-Programmieraufgaben für Graja im ProFormA-Aufgabenformat (2016)

Garmann, Robert ; Fricke, Peter ; Bott, Oliver J.

Das ProFormA-Aufgabenformat wurde eingeführt, um den Austausch von Programmieraufgaben zwischen beliebigen Autobewertern (Grader) zu ermöglichen. Ein Autobewerter führt im ProFormA-Aufgabenformat spezifizierte „Tests“ sequentiell aus, um ein vom Studierenden eingereichtes Programm zu prüfen. Für die Strukturierung und Darstellung der Testergebnisse existiert derzeit kein graderübergreifender Standard. Wir schlagen eine Erweiterung des ProFormA-Aufgabenformats um eine Hierarchie von Bewertungsaspekten vor, die nach didaktischen Aspekten gruppiert ist und entsprechende Testausführungen referenziert. Die Erweiterung wurde in Graja umgesetzt, einem Autobewerter für Java-Programme. Je nach gewünschter Detaillierung der Bewertungsaspekte sind Testausführungen in Teilausführungen aufzubrechen. Wir illustrieren unseren Vorschlag mit den Testwerkzeugen Compiler, dynamischer Softwaretest, statische Analyse sowie unter Einsatz menschlicher Bewerter.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

52 search hits