Refine
Year of publication
Document Type
- Conference Proceeding (50) (remove)
Has Fulltext
- yes (50)
Is part of the Bibliography
- no (50)
Keywords
- Text Mining (5)
- Concreteness (4)
- Semantik (4)
- Ausbildung (3)
- Bibliothek (3)
- German (3)
- Information Retrieval (3)
- Informationsmanagement (3)
- Klassifikation (3)
- Bibliothekswesen (2)
Institute
- Fakultät III - Medien, Information und Design (50) (remove)
After kidney transplantation graft rejection must be prevented. Therefore, a multitude of parameters of the patient is observed pre- and postoperatively. To support this process, the Screen Reject research project is developing a data warehouse optimized for kidney rejection diagnostics. In the course of this project it was discovered that important information are only available in form of free texts instead of structured data and can therefore not be processed by standard ETL tools, which is necessary to establish a digital expert system for rejection diagnostics. Due to this reason, data integration has been improved by a combination of methods from natural language processing and methods from image processing. Based on state-of-the-art data warehousing technologies (Microsoft SSIS), a generic data integration tool has been developed. The tool was evaluated by extracting Banff-classification from 218 pathology reports and extracting HLA mismatches from about 1700 PDF files, both written in german language.
This paper deals with new job profiles in libraries, mainly systems librarians (German: Systembibliothekare), IT librarians (German: IT-Bibliothekare) and data librarians (German: Datenbibliothekare). It investigates the vacancies and requirements of these positions in the German-speaking countries by analyzing one hundred and fifty published job advertisements of OpenBiblioJobs between 2012-2016. In addition, the distribution of positions, institutional bearers, different job titles as well as time limits, scope of work and remuneration of the positions are evaluated. The analysis of the remuneration in the public sector in Germany also provides information on demands for a bachelor's or master's degree.
The average annual increase in job vacancies between 2012 and 2016 is 14.19%, confirming the need and necessity of these professional library profiles.
The higher remuneration of the positions in data management, in comparison to the systems librarian, proves the prerequisite of the master's degree and thus indicates a desideratum due to missing or few master's degree courses. Accordingly, the range of bachelor's degree courses (or IT-oriented major areas of study with optional compulsory modules in existing bachelor's degree courses) for systems and IT librarians must be further expanded. An alternative could also be modular education programs for librarians and information scientists with professional experience, as it is already the case for music librarians.
Self-directed learning is an essential basis for lifelong learning and requires constantly changing, target groupspecific and personalized prerequisites in order to motivate people to deal with modern learning content, not to overburden them and yet to adequately convey complex contexts. Current challenges in dealing with digital resources such as information overload, reduction of complexity and focus, motivation to learn, self-control or psychological wellbeing are taken up in the conception of learning settings within our QpLuS IM project for the study program Information Management and Information Management extra-occupational (IM) at the University of Applied Sciences and Arts Hannover. We present an interactive video on the functionality of search engines as a practical example of a medially high-quality and focused self-learning format that has been methodically produced in line with our agile, media-didactic process and stage model of complexity levels.
Automatic classification of scientific records using the German Subject Heading Authority File (SWD)
(2012)
The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
We present a simple method to find topics in user reviews that accompany ratings for products or services. Standard topic analysis will perform sub-optimal on such data since the word distributions in the documents are not only determined by the topics but by the sentiment as well. We reduce the influence of the sentiment on the topic selection by adding two explicit topics, representing positive and negative sentiment. We evaluate the proposed method on a set of over 15,000 hospital reviews. We show that the proposed method, Latent Semantic Analysis with explicit word features, finds topics with a much smaller bias for sentiments than other similar methods.
Regional Innovation Systems describe the relations between actors, structures and infrastructures in a region in order to stimulate innovation and regional development. For these systems the collection and organization of information is crucial. In the present paper we investigate the possibilities to extract information from websites of companies. First we describe regional innovation systems and the information types that are necessary to create them. Then we discuss the possibilities of text mining and keyword extraction techniques to extract this information from company websites. Finally, we describe a small scale experiment in which keywords related to economic sectors and commodities are extracted from the websites of over 200 companies. This experiment shows what the main challenges are for information extraction from websites for regional innovation systems.
Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German is very limited. To fill this gap, we developed a simple lemmatizer that can be trained on any lemmatized corpus. For a full form word the tagger tries to find the sequence of morphemes that is most likely to generate that word. From this sequence of tags we can easily derive the stem, the lemma and the part of speech (PoS) of the word. We show (i) that the quality of this approach is comparable to state of the art methods and (ii) that we can improve the results of Part-of-Speech (PoS) tagging when we include the morphological analysis of each word.
The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, except for the one using singular value decomposition to reduce the dimensionality of the feature space, is determined to a large extent by the frequency of the words. In a binary classification task of pairs of synonyms and unrelated words we find that for all similarity measures the results can be improved when we correct for the frequency bias.
This paper describes the approach of the Hochschule Hannover to the SemEval 2013 Task Evaluating Phrasal Semantics. In order to compare a single word with a two word phrase we compute various distributional similarities, among which a new similarity measure, based on Jensen-Shannon Divergence with a correction for frequency effects. The classification is done by a support vector machine that uses all similarities as features. The approach turned out to be the most successful one in the task.
This paper presents a possibility to extend the formalism of linear indexed grammars. The extension is based on the use of tuples of pushdowns instead of one pushdown to store indices during a derivation. If a restriction on the accessibility of the pushdowns is used, it can be shown that the resulting formalisms give rise to a hierarchy of languages that is equivalent with a hierarchy defined by Weir. For this equivalence, that was already known for a slightly different formalism, this paper gives a new proof. Since all languages of Weir's hierarchy are known to be mildly context sensitive, the proposed extensions of LIGs become comparable with extensions of tree adjoining grammars and head grammars.
In this paper we investigate how concreteness and abstractness are represented in word embedding spaces. We use data for English and German, and show that concreteness and abstractness can be determined independently and turn out to be completely opposite directions in the embedding space. Various methods can be used to determine the direction of concreteness, always resulting in roughly the same vector. Though concreteness is a central aspect of the meaning of words and can be detected clearly in embedding spaces, it seems not as easy to subtract or add concreteness to words to obtain other words or word senses like e.g. can be done with a semantic property like gender.
Complications may occur after a liver transplantation, therefore proper monitoring and care in the post-operation phase plays a very important role. Sometimes, monitoring and care for patients from abroad is difficult due to a variety of reasons, e.g., different care facilities. The objective of our research for this paper is to design, implement and evaluate a home monitoring and decision support infrastructure for international children who underwent liver transplant operation. A point-of-care device and the PedsQL questionnaire were used in patients’ home environment for measuring the blood parameters and assessing quality of life. By using a tablet PC and a specially developed software, the measured results were able to be transmitted to the health care providers via internet. So far, the developed infrastructure has been evaluated with four international patients/families transferring 38 records of blood test. The evaluation showed that the home monitoring and decision support infrastructure is technically feasible and is able to give timely alarm in case of abnormal situation as well as may increase parent’s feeling of safety for their children.
The NOA project collects and stores images from open access publications and makes them findable and reusable. During the project a focus group workshop was held to determine whether the development is addressing researchers’ needs. This took place before the second half of the project so that the results could be considered for further development since addressing users’ needs is a big part of the project. The focus was to find out what content and functionality they expect from image repositories.
In a first step, participants were asked to fill out a survey about their images use. Secondly, they tested different use cases on the live system. The first finding is that users have a need for finding scholarly images but it is not a routine task and they often do not know any image repositories. This is another reason for repositories to become more open and reach users by integrating with other content providers. The second finding is that users paid attention to image licenses but struggled to find and interpret them while also being unsure how to cite images. In general, there is a high demand for reusing scholarly images but the existing infrastructure has room to improve.
In this poster we present the ongoing development of an integrated free and open source toolchain for semantic annotation of digitised cultural heritage. The toolchain development involves the specification of a common data model that aims to increase interoperability across diverse datasets and to enable new collaborative research approaches.
A new FOSS (free and open source software) toolchain and associated workflow is being developed in the context of NFDI4Culture, a German consortium of research- and cultural heritage institutions working towards a shared infrastructure for research data that meets the needs of 21st century data creators, maintainers and end users across the broad spectrum of the digital libraries and archives field, and the digital humanities. This short paper and demo present how the integrated toolchain connects: 1) OpenRefine - for data reconciliation and batch upload; 2) Wikibase - for linked open data (LOD) storage; and 3) Kompakkt - for rendering and annotating 3D models. The presentation is aimed at librarians, digital curators and data managers interested in learning how to manage research datasets containing 3D media, and how to make them available within an open data environment with 3D-rendering and collaborative annotation features.
Wikidata and Wikibase as complementary research data management services for cultural heritage data
(2022)
The NFDI (German National Research Data Infrastructure) consortia are associations of various institutions within a specific research field, which work together to develop common data infrastructures, guidelines, best practices and tools that conform to the principles of FAIR data. Within the NFDI, a common question is: What is the potential of Wikidata to be used as an application for science and research? In this paper, we address this question by tracing current research usecases and applications for Wikidata, its relation to standalone Wikibase instances, and how the two can function as complementary services to meet a range of research needs. This paper builds on lessons learned through the development of open data projects and software services within the Open Science Lab at TIB, Hannover, in the context of NFDI4Culture – the consortium including participants across the broad spectrum of the digital libraries, archives, and museums field, and the digital humanities.
Editorial for the 17th European Networked Knowledge Organization Systems Workshop (NKOS 2017)
(2017)
Knowledge Organization Systems (KOS), in the form of classification systems, thesauri, lexical databases, ontologies, and taxonomies, play a crucial role in digital information management and applications generally. Carrying semantics in a well-controlled and documented way, Knowledge Organization Systems serve a variety of important functions: tools for representation and indexing of information and documents, knowledge-based support to information searchers, semantic road maps to domains and disciplines, communication tool by providing conceptual framework, and conceptual basis for knowledge based systems, e.g. automated classification systems. New networked KOS (NKOS) services and applications are emerging, and we have reached a stage where many KOS standards exist and the integration of linked services is no longer just a future scenario. This editorial describes the workshop outline and overview of presented papers at the 17th European Networked Knowledge Organization Systems Workshop (NKOS 2017) which was held during the TPDL 2017 Conference in Thessaloniki, Greece.
Editorial for the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016)
(2016)
Knowledge Organization Systems (KOS), in the form of classification systems, thesauri, lexical databases, ontologies, and taxonomies, play a crucial role in digital information management and applications generally. Carrying semantics in a well-controlled and documented way, Knowledge Organisation Systems serve a variety of important functions: tools for representation and indexing of information and documents, knowledge-based support to information searchers, semantic road maps to domains and disciplines, communication tool by providing conceptual framework, and conceptual basis for knowledge based systems, e.g. automated classification systems. New networked KOS (NKOS) services and applications are emerging, and we have reached a stage where many KOS standards exist and the integration of linked services is no longer just a future scenario. This editorial describes the workshop outline and overview of presented papers at the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) in Hannover, Germany.
Fall events and their severe consequences represent not only a threatening problem for the affected individual, but also cause a significant burden for health care systems. Our research work aims to elucidate some of the prospects and problems of current sensor-based fall risk assessment approaches. Selected results of a questionnaire-based survey given to experts during topical workshops at international conferences are presented. The majority of domain experts confirmed that fall risk assessment could potentially be valuable for the community and that prediction is deemed possible, though limited. We conclude with a discussion of practical issues concerning adequate outcome parameters for clinical studies and data sharing within the research community. All participants agreed that sensor-based fall risk assessment is a promising and valuable approach, but that more prospective clinical studies with clearly defined outcome measures are necessary.
The amount of papers published yearly increases since decades. Libraries need to make these resources accessible and available with classification being an important aspect and part of this process. This paper analyzes prerequisites and possibilities of automatic classification of medical literature. We explain the selection, preprocessing and analysis of data consisting of catalogue datasets from the library of the Hanover Medical School, Lower Saxony, Germany. In the present study, 19,348 documents, represented by notations of library classification systems such as e.g. the Dewey Decimal Classification (DDC), were classified into 514 different classes from the National Library of Medicine (NLM) classification system. The algorithm used was k-nearest-neighbours (kNN). A correct classification rate of 55.7% could be achieved. To the best of our knowledge, this is not only the first research conducted towards the use of the NLM classification in automatic classification but also the first approach that exclusively considers already assigned notations from other
classification systems for this purpose.