Search

Predicting the Concreteness of German Words (2020)

Concreteness of words has been measured and used in psycholinguistics already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness. We give an overview of available datasets for German, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show for all datasets there are no significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.

Verbal Idioms: Concrete Nouns in Abstract Contexts (2021)

Charbonnier, Jean ; Wartena, Christian

In this paper, we present our approach for the KONVENS 2021 shared task Disambiguation of German Verbal Idioms. Our model is a decision tree-based classifier that uses static word embeddings and computed concreteness values to predict whether a verbal idiom is used figuratively or literal.

Predicting Visible Terms from Image Captions using Concreteness and Distributional Semantics (2022)

Charbonnier, Jean ; Wartena, Christian

Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.

A study on the kinematics of a new Schukey-type rotary compressor (2021)

Cui, Bin ; Becker, Klaus ; Lüdersen, Ulrich ; Gottschlich, Martin ; Kabelac, Stephan

A new type of rotary compressor, called “rotary-chamber compressor”, consists of two interlocking rotors with 4 wings each, that perform non-uniform rotary movements. Both rotors have the same direction of rotation, while one rotor is accelerating, the other rotor is retarding. After surpassing a specific mark, the sequence changes and the leading rotor begins to retard and vice versa. Due to the resulting relative phase difference, the volume between the two wings is changing periodically, which allows pulsating working chambers. The technology was first introduced by its founder Jürgen Schukey in 1987. Since then, no further development on this machine is known to us except our own. In this contribution, a study on the kinematics of the rotary-chamber-compressor is presented. Initial studies have shown that changes in the kinematics of the rotors will have a direct influence on the thermodynamical variables, which, if optimized, can lead to an increased performance of the machine. Therefore, a mathematical model has been developed to obtain the performance parameters from different kinematic concepts by using numerical CFD analysis. Furthermore, additional optimization possibilities will be listed and discussed.

30th BOBCATSSS Symposium - Book of Abstracts (2022)

Dille, Nils ; Stegemeyer, Merle ; Janus, Leandra ; Witten, Marna ; Menzel, Marie-Antoinette ; Arnold, Michelle ; Eichhorn, Karin

Data and Information Science: Book of Abstracts at BOBCATSSS 2022 Hybrid Conference, 23rd - 25th of May 2022, Debrecen. This year marks the 30th anniversary of the BOBCATSSS. The BOBCATSSS is an international, annual symposium designed for librarians and information professionals in a rapidly changing environment. Over the past 30 years, the conference has included exciting topics, great venues, interested guests and engaging presenters. This year we would like to introduce the topics of the many papers presented in the Book of Abstracts for the first time in presence at the University of Debrecen and hybrid. The Book of Abstracts provides an overview of all presentations given at BOBCATSSS. Presentations are listed in alphabetical order by title and include speeches, Pecha Kuchas, posters and workshops. The theme of BOBCATSSS is Data and Information Science. Data and information are the basis for decisions and processes in business, politics and science. Particularly important in the current era of digital transformation. This is exactly where this year's subthemes come in. They deal with data science, openness as well as institutional roles.

Professional Life of Information System Graduates : Impressions and Experiences (2019)

Disterer, Georg

Aim/Purpose: We explore impressions and experiences of Information Systems graduates during their first years of employment in the IT field. The results help to understand work satisfaction, career ambition, and motivation of junior employees. This way, the attractiveness of working in the field of IS can be increased and the shortage of junior employees reduced. Background: Currently IT professions are characterized by terms such as “shortage of professionals” and “shortage of junior employees”. To attract more people to work in IT detailed knowledge about experiences of junior employees is necessary. Methodology: Data from a large survey of 193 graduates of the degree program “Information Systems” at University of Applied Sciences and Arts Hannover (Germany) show characteristics of their professional life like work satisfaction, motivation, career ambition, satisfaction with opportunities, development and career advancement, satisfaction with work-life balance. It is also asked whether men and women gain the same experiences when entering the job market and have the same perceptions. Findings: The participants were highly satisfied with their work, but limitations or restrictions due to gender are noteworthy. Recommendations for Practitioners: The results provide information on how human resource policies can make IT professions more attractive and thus convince graduates to seek jobs in the field. For instance, improving the balance between work and various areas of private life seems promising. Also, restrictions with respect to the work climate and improving communication along several dimensions need to be considered. Future Research: More detailed research on ambition and achievement is necessary to understand gender differences.

BYOD Bring Your Own Device (2013)

Disterer, Georg ; Kleiner, Carsten

Using modern devices like smartphones and tablets offers a wide variety of advantages; this has made them very popular as consumer devices in private life. Using them in the workplace is also popular. However, who wants to carry around and handle two devices; one for personal use, and one for work-related tasks? That is why “dual use”, using one single device for private and business applications, may represent a proper solution. The result is “Bring Your Own Device,” or BYOD, which describes the circumstance in which users make their own personal devices available for company use. For companies, this brings some opportunities and risks. We describe and discuss organizational issues, technical approaches, and solutions.

Incorporating Situation Awareness into Recommender Systems (2017)

Dötterl, Jeremias ; Bruns, Ralf ; Dunkel, Jürgen

Nowadays, smartphones and sensor devices can provide a variety of information about a user’s current situation. So far, many recommender systems neglect this kind of information and thus cannot provide situationspecific recommendations. Situation-aware recommender systems adapt to changes in the user’s environment and therefore are able to offer recommendations that are more appropriate for the current situation. In this paper, we present a software architecture that enables situation awareness for arbitrary recommendation techniques. The proposed system considers both (semi-)static user profiles and volatile situational knowledge to obtain meaningful recommendations. Furthermore, the implementation of the architecture in a museum of natural history is presented, which uses Complex Event Processing to achieve situation awareness.

On-Time Delivery in Crowdshipping Systems: An Agent-Based Approach Using Streaming Data (2020)

Dötterl, Jeremias ; Bruns, Ralf ; Dunkel, Jürgen ; Ossowski, Sascha

In parcel delivery, the “last mile” from the parcel hub to the customer is costly, especially for time-sensitive delivery tasks that have to be completed within hours after arrival. Recently, crowdshipping has attracted increased attention as a new alternative to traditional delivery modes. In crowdshipping, private citizens (“the crowd”) perform short detours in their daily lives to contribute to parcel delivery in exchange for small incentives. However, achieving desirable crowd behavior is challenging as the crowd is highly dynamic and consists of autonomous, self-interested individuals. Leveraging crowdshipping for time-sensitive deliveries remains an open challenge. In this paper, we present an agent-based approach to on-time parcel delivery with crowds. Our system performs data stream processing on the couriers’ smartphone sensor data to predict delivery delays. Whenever a delay is predicted, the system attempts to forge an agreement for transferring the parcel from the current deliverer to a more promising courier nearby. Our experiments show that through accurate delay predictions and purposeful task transfers many delays can be prevented that would occur without our approach.

Implementing LOINC – Current Status and Ongoing Work at a Medical University (2019)

Fiebeck, Johanna ; Gietzelt, Matthias ; Ballout, Sarah ; Christmann, Martin ; Fradziak, Maikel ; Laser, Hans ; Ruppel, Julia ; Schönfeld, Norman ; Teppner, Sonja ; Gerbel, Svetlana

The Logical Observation Identifiers, Names and Codes (LOINC) is a common terminology used for standardizing laboratory terms. Within the consortium of the HiGHmed project, LOINC is one of the central terminologies used for health data sharing across all university sites. Therefore, linking the LOINC codes to the site-specific tests and measures is one crucial step to reach this goal. In this work we report our ongoing efforts in implementing LOINC to our laboratory information system and research infrastructure, as well as our challenges and the lessons learned. 407 local terms could be mapped to 376 LOINC codes of which 209 are already available to routine laboratory data. In our experience, mapping of local terms to LOINC is a widely manual and time consuming process for reasons of language and expert knowledge of local laboratory procedures.

Regional Knowledge Maps - Potential and Challenges (2013)

Garcia-Alsina, Montserrat ; Wartena, Christian ; Lieberam-Schmidt, Sönke

Regional knowledge map is a tool recently demanded by some actors in an institutional level to help regional policy and innovation in a territory. Besides, knowledge maps facilitate the interaction between the actors of a territory and the collective learning. This paper reports the work in progress of a research project which objective is to define a methodology to efficiently design territorial knowledge maps, by extracting information of big volumes of data contained in diverse sources of information related to a region. Knowledge maps facilitate management of the intellectual capital in organisations. This paper investigates the value to apply this tool to a territorial region to manage the structures, infrastructures and the resources to enable regional innovation and regional development. Their design involves the identification of information sources that are required to find which knowledge is located in a territory, which actors are involved in innovation, and which is the context to develop this innovation (structures, infrastructures, resources and social capital). This paper summarizes the theoretical background and framework for the design of a methodology for the construction of knowledge maps, and gives an overview of the main challenges for the design of regional knowledge maps.

Writer recognition by characters, words and sentences (2009)

Gehrke, Martin ; Steinke, Karl-Heinz ; Dzido, Robert

The methods developed in the research project "Herbar Digital" are to help plant taxonomists to master the great amount of material of about 3.5 million dried plants on paper sheets belonging to the Botanic Museum Berlin in Germany. Frequently the collector of the plant is unknown. So a procedure had to be developed in order to determine the writer of the handwriting on the sheet. In the present work the static character is transformed into a dynamic form. This is done with the model of an inert ball which is rolled through the written character. During this off-line writer recognition, different mathematical procedures are used such as the reproduction of the write line of individual characters by Legendre polynomials. When only one character is used, a recognition rate of about 40% is obtained. By combining multiple characters, the recognition rate rises considerably and reaches 98.7% with 13 characters and 93 writers (chosen randomly from the international IAM-database [3]). Another approach tries to identify the writer by handwritten words. The word is cut out and transformed into a 6-dimensional time series and compared e.g. by means of DTW-methods. A global statistical approach using the whole handwritten sentences results in a similar recognition rate of more than 98%. By combining the methods, a recognition rate of 99.5% is achieved.

Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal (2016)

Heller, Lambert ; Blümel, Ina ; Cartellieri, Simone ; Wartena, Christian

Multimedia objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier. Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedia open access objects in order to improve the accessibility to this material. With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try - frustratingly so, even if there is in fact such an image included in some open access article.

Automated generic integration of flight logbook data into aircraft maintenance systems (2011)

Hunte, Oliver ; Kleiner, Carsten ; Koch, Uwe ; Koschel, Arne ; Koschel, Björn ; Nitz, Stefan

The automated transfer of flight logbook information from aircrafts into aircraft maintenance systems leads to reduced ground and maintenance time and is thus desirable from an economical point of view. Until recently, flight logbooks have not been managed electronically in aircrafts or at least the data transfer from aircraft to ground maintenance system has been executed manually. Latest aircraft types such as the Airbus A380 or the Boeing 787 do support an electronic logbook and thus make an automated transfer possible. A generic flight logbook transfer system must deal with different data formats on the input side – due to different aircraft makes and models – as well as different, distributed aircraft maintenance systems for different airlines as aircraft operators. This article contributes the concept and top level distributed system architecture of such a generic system for automated flight log data transfer. It has been developed within a joint industry and applied research project. The architecture has already been successfully evaluated in a prototypical implementation.

Structural Analysis of Contract Renewals (2019)

Josi, Frieda ; Wartena, Christian

In the present paper we sketch an automated procedure to compare different versions of a contract. The contract texts used for this purpose are structurally differently composed PDF files that are converted into structured XML files by identifying and classifying text boxes. A classifier trained on manually annotated contracts achieves an accuracy of 87% on this task. We align contract versions and classify aligned text fragments into different similarity classes that enhance the manual comparison of changes in document versions. The main challenges are to deal with OCR errors and different layout of identical or similar texts. We demonstrate the procedure using some freely available contracts from the City of Hamburg written in German. The methods, however, are language agnostic and can be applied to other contracts as well.

Text-based annotation of scientific images using Wikimedia categories (2018)

Josi, Frieda ; Wartena, Christian ; Charbonnier, Jean

The reuse of scientific raw data is a key demand of Open Science. In the project NOA we foster reuse of scientific images by collecting and uploading them to Wikimedia Commons. In this paper we present a text-based annotation method that proposes Wikipedia categories for open access images. The assigned categories can be used for image retrieval or to upload images to Wikimedia Commons. The annotation basically consists of two phases: extracting salient keywords and mapping these keywords to categories. The results are evaluated on a small record of open access images that were manually annotated.

Detecting Paraphrases of Standard Clause Titles in Insurance Contracts (2019)

Josi, Frieda ; Wartena, Christian ; Heid, Ulrich

For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify used contract clauses. This paper investigates how the similarity between titles of model clauses and headings extracted from contracts can be computed, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two additional semantic similarity measures based on word embeddings using pre-trained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses.

Representing Standard Text Formulations as Directed Graphs (2021)

Josi, Frieda ; Wartena, Christian ; Heid, Ulrich

In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora.

Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features (2022)

Josi, Frieda ; Wartena, Christian ; Heid, Ulrich

Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions. Both the corpus and all manual annotations are made freely available. The method is language agnostic.

Development of a Didactic Online Course Concept for Heterogeneous Audience Groups in the Context of Healthcare IT (2021)

Katzensteiner, Matthias ; Vogel, Stefan ; Hüsers, Jens ; Richter, Jendrik ; Hölken, Johannes ; Lesniewska, Natalia ; Bott, Oliver J.

Building a well-founded understanding of the concepts, tasks and limitations of IT in all areas of society is an essential prerequisite for future developments in business and research. This applies in particular to the healthcare sector and medical research, which are affected by the noticeable advances in digitization. In the transfer project “Zukunftslabor Gesundheit” (ZLG), a teaching framework was developed to support the development of further education online courses in order to teach heterogeneous groups of learners independent of location and prior knowledge. The study at hand describes the development and components of the framework.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

120 search hits