Refine
Year of publication
Document Type
- Conference Proceeding (52) (remove)
Has Fulltext
- yes (52)
Is part of the Bibliography
- no (52)
Keywords
- Text Mining (5)
- Concreteness (4)
- Semantik (4)
- Ausbildung (3)
- Bibliothek (3)
- German (3)
- Information Retrieval (3)
- Informationsmanagement (3)
- Klassifikation (3)
- Bibliothekswesen (2)
Institute
- Fakultät III - Medien, Information und Design (52) (remove)
Wikidata and Wikibase as complementary research data management services for cultural heritage data
(2022)
The NFDI (German National Research Data Infrastructure) consortia are associations of various institutions within a specific research field, which work together to develop common data infrastructures, guidelines, best practices and tools that conform to the principles of FAIR data. Within the NFDI, a common question is: What is the potential of Wikidata to be used as an application for science and research? In this paper, we address this question by tracing current research usecases and applications for Wikidata, its relation to standalone Wikibase instances, and how the two can function as complementary services to meet a range of research needs. This paper builds on lessons learned through the development of open data projects and software services within the Open Science Lab at TIB, Hannover, in the context of NFDI4Culture – the consortium including participants across the broad spectrum of the digital libraries, archives, and museums field, and the digital humanities.
The NOA project collects and stores images from open access publications and makes them findable and reusable. During the project a focus group workshop was held to determine whether the development is addressing researchers’ needs. This took place before the second half of the project so that the results could be considered for further development since addressing users’ needs is a big part of the project. The focus was to find out what content and functionality they expect from image repositories.
In a first step, participants were asked to fill out a survey about their images use. Secondly, they tested different use cases on the live system. The first finding is that users have a need for finding scholarly images but it is not a routine task and they often do not know any image repositories. This is another reason for repositories to become more open and reach users by integrating with other content providers. The second finding is that users paid attention to image licenses but struggled to find and interpret them while also being unsure how to cite images. In general, there is a high demand for reusing scholarly images but the existing infrastructure has room to improve.
Scientific papers from all disciplines contain many abbreviations and acronyms. In many cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each acronym and thus can be applied to a large number of different acronyms with only few instances. We constructed a set of 19,954 examples of 4,365 ambiguous acronyms from image captions in scientific papers along with their contextually correct definition from different domains. We learn word embeddings for all words in the corpus and compare the averaged context vector of the words in the expansion of an acronym with the weighted average vector of the words in the context of the acronym. We show that this method clearly outperforms (classical) cosine similarity. Furthermore, we show that word embeddings learned from a 1 billion word corpus of scientific exts outperform word embeddings learned from much larger general corpora.
In this poster we present the ongoing development of an integrated free and open source toolchain for semantic annotation of digitised cultural heritage. The toolchain development involves the specification of a common data model that aims to increase interoperability across diverse datasets and to enable new collaborative research approaches.
Research information, i.e., data about research projects, organisations, researchers or research outputs such as publications or patents, is spread across the web, usually residing in institutional and personal web pages or in semi-open databases and information systems. While there exists a wealth of unstructured information, structured data is limited and often exposed following proprietary or less-established schemas and interfaces. Therefore, a holistic and consistent view on research information across organisational and national boundaries is not feasible. On the other hand, web crawling and information extraction techniques have matured throughout the last decade, allowing for automated approaches of harvesting, extracting and consolidating research information into a more coherent knowledge graph. In this work, we give an overview of the current state of the art in research information sharing on the web and present initial ideas towards a more holistic approach for boot-strapping research information from available web sources.
The reuse of scientific raw data is a key demand of Open Science. In the project NOA we foster reuse of scientific images by collecting and uploading them to Wikimedia Commons. In this paper we present a text-based annotation method that proposes Wikipedia categories for open access images. The assigned categories can be used for image retrieval or to upload images to Wikimedia Commons. The annotation basically consists of two phases: extracting salient keywords and mapping these keywords to categories. The results are evaluated on a small record of open access images that were manually annotated.
To learn a subject, the acquisition of the associated technical language is important.
Despite this widely accepted importance of learning the technical language, hardly any studies are published that describe the characteristics of most technical languages that students are supposed to learn. This might largely be due to the absence of specialized text corpora to study such languages at lexical, syntactical and textual level. In the present paper we describe a corpus of German physics text that can be used to study the language used in physics. A large and a small variant are compiled. The small version of the corpus consists of 5.3 Million words and is available on request.
Der Tagungsband der Teaching Trends 2018 bietet allen Leser*innen spannende Einblicke in Präsenzhochschulen, die in geschickten Szenarien verschiedene digitale Medien für den Kompetenzerwerb ihrer Studierenden nutzen. In einer breiten Sicht auf die Digitalisierung beschäftigen sich die Tagungsbeiträge mit neuen Lernformaten wie Blended Learning und Inverted Classroom, deren aktuellen rechtlichen Rahmenbedingungen in DSGVO und Urheberrecht und technischen Grundlagen, z.B. in Augmented / Virtual Reality oder Audience Response. Darüber hinaus jedoch kommen übergreifende Strategien und Entwicklungskonzepte zu Wort, die die Hochschule in eine digitale Zukunft führen. In allen Bereichen berichteten die Vortragenden sowohl direkt aus ihrer Lehrpraxis als auch aus der begleitenden Forschung. Zur Abrundung der Tagung haben die Herausgeber*innen das einleitende Streitgespräch zur Bedeutung der digitalen Transformation für Universitäten, die Podiumsdiskussion zu Herausforderungen, die sich daraus für das Studium ergeben, sowie eine Keynote zur Architektur von Lernräumen zu Papier gebracht.
This paper deals with new job profiles in libraries, mainly systems librarians (German: Systembibliothekare), IT librarians (German: IT-Bibliothekare) and data librarians (German: Datenbibliothekare). It investigates the vacancies and requirements of these positions in the German-speaking countries by analyzing one hundred and fifty published job advertisements of OpenBiblioJobs between 2012-2016. In addition, the distribution of positions, institutional bearers, different job titles as well as time limits, scope of work and remuneration of the positions are evaluated. The analysis of the remuneration in the public sector in Germany also provides information on demands for a bachelor's or master's degree.
The average annual increase in job vacancies between 2012 and 2016 is 14.19%, confirming the need and necessity of these professional library profiles.
The higher remuneration of the positions in data management, in comparison to the systems librarian, proves the prerequisite of the master's degree and thus indicates a desideratum due to missing or few master's degree courses. Accordingly, the range of bachelor's degree courses (or IT-oriented major areas of study with optional compulsory modules in existing bachelor's degree courses) for systems and IT librarians must be further expanded. An alternative could also be modular education programs for librarians and information scientists with professional experience, as it is already the case for music librarians.