Refine
Year of publication
Document Type
- Conference Proceeding (7)
- Article (5)
- Book (5)
- Bachelor Thesis (4)
- Preprint (1)
Has Fulltext
- yes (22)
Is part of the Bibliography
- no (22)
Keywords
This paper summarizes the results of a comprehensive statistical analysis on a corpus of open access articles and contained figures. It gives an insight into quantitative relationships between illustrations or types of illustrations, caption lengths, subjects, publishers, author affiliations, article citations and others.
Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal
(2016)
Multimedia objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier.
Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedia open access objects in order to improve the accessibility to this material.
With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try - frustratingly so, even if there is in fact such an image included in some open access article.
NOA is a search engine for scientific images from open access publications based on full text indexing of all text referring to the images and filtering for disciplines and image type. Images will be annotated with Wikipedia categories for better discoverability and for uploading to WikiCommons. Currently we have indexed approximately 2,7 Million images from over 710 000 scientific papers from all fields of science.
Scientific papers from all disciplines contain many abbreviations and acronyms. In many cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each acronym and thus can be applied to a large number of different acronyms with only few instances. We constructed a set of 19,954 examples of 4,365 ambiguous acronyms from image captions in scientific papers along with their contextually correct definition from different domains. We learn word embeddings for all words in the corpus and compare the averaged context vector of the words in the expansion of an acronym with the weighted average vector of the words in the context of the acronym. We show that this method clearly outperforms (classical) cosine similarity. Furthermore, we show that word embeddings learned from a 1 billion word corpus of scientific exts outperform word embeddings learned from much larger general corpora.
Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.
Wikidata and Wikibase as complementary research data management services for cultural heritage data
(2022)
The NFDI (German National Research Data Infrastructure) consortia are associations of various institutions within a specific research field, which work together to develop common data infrastructures, guidelines, best practices and tools that conform to the principles of FAIR data. Within the NFDI, a common question is: What is the potential of Wikidata to be used as an application for science and research? In this paper, we address this question by tracing current research usecases and applications for Wikidata, its relation to standalone Wikibase instances, and how the two can function as complementary services to meet a range of research needs. This paper builds on lessons learned through the development of open data projects and software services within the Open Science Lab at TIB, Hannover, in the context of NFDI4Culture – the consortium including participants across the broad spectrum of the digital libraries, archives, and museums field, and the digital humanities.
The reuse of scientific raw data is a key demand of Open Science. In the project NOA we foster reuse of scientific images by collecting and uploading them to Wikimedia Commons. In this paper we present a text-based annotation method that proposes Wikipedia categories for open access images. The assigned categories can be used for image retrieval or to upload images to Wikimedia Commons. The annotation basically consists of two phases: extracting salient keywords and mapping these keywords to categories. The results are evaluated on a small record of open access images that were manually annotated.
Concreteness of words has been studied extensively in psycholinguistic literature. A number of datasets have been created with average values for perceived concreteness of words. We show that we can train a regression model on these data, using word embeddings and morphological features, that can predict these concreteness values with high accuracy. We evaluate the model on 7 publicly available datasets. Only for a few small subsets of these datasets prediction of concreteness values are found in the literature. Our results clearly outperform the reported results for these datasets.
The bio-based plastic market is forecast to grow in the next years. With a growing market share and product range, the implementation of circular thinking is becoming more and more important also for bio-based plastics to enable a sound circular economy for these group of plastics. Therefore, it is important to assess the environmental performance for different end-of-life options of bio-based plastics from an early stage on. This review presents a comprehensive overview on the current status quo of different end-of-life options for bio-based plastics from an environmental perspective. Based on the status quo and the corresponding impact assessment results, the global plastic demand as well as the technical substitution potential of bio-based plastics, the environmental saving potential in case of the different end-of-life options was calculated. The review shows that there is a focus on polylactic acid (PLA) regarding end-of-life assessment, with studies covering all end-of-life options. The focus of the impact assessment has been set on global warming potential (GWP). With respect to GWP, the analysis of a future global potential of PLA showed, for mechanical recycling, the highest saving potential with 94.1 Mio. t CO2-eq. per year in comparison to virgin material.
Digital data on tangible and intangible cultural assets is an essential part of daily life, communication and experience. It has a lasting influence on the perception of cultural identity as well as on the interactions between research, the cultural economy and society. Throughout the last three decades, many cultural heritage institutions have contributed a wealth of digital representations of cultural assets (2D digital reproductions of paintings, sheet music, 3D digital models of sculptures, monuments, rooms, buildings), audio-visual data (music, film, stage performances), and procedural research data such as encoding and annotation formats. The long-term preservation and FAIR availability of research data from the cultural heritage domain is fundamentally important, not only for future academic success in the humanities but also for the cultural identity of individuals and society as a whole. Up to now, no coordinated effort for professional research data management on a national level exists in Germany. NFDI4Culture aims to fill this gap and create a usercentered, research-driven infrastructure that will cover a broad range of research domains from musicology, art history and architecture to performance, theatre, film, and media studies.
The research landscape addressed by the consortium is characterized by strong institutional differentiation. Research units in the consortium's community of interest comprise university institutes, art colleges, academies, galleries, libraries, archives and museums. This diverse landscape is also characterized by an abundance of research objects, methodologies and a great potential for data-driven research. In a unique effort carried out by the applicant and co-applicants of this proposal and ten academic societies, this community is interconnected for the first time through a federated approach that is ideally suited to the needs of the participating researchers. To promote collaboration within the NFDI, to share knowledge and technology and to provide extensive support for its users have been the guiding principles of the consortium from the beginning and will be at the heart of all workflows and decision-making processes. Thanks to these principles, NFDI4Culture has gathered strong support ranging from individual researchers to highlevel cultural heritage organizations such as the UNESCO, the International Council of Museums, the Open Knowledge Foundation and Wikimedia. On this basis, NFDI4Culture will take innovative measures that promote a cultural change towards a more reflective and sustainable handling of research data and at the same time boost qualification and professionalization in data-driven research in the domain of cultural heritage. This will create a long-lasting impact on science, cultural economy and society as a whole.