Refine
Year of publication
Document Type
- Article (8)
- Conference Proceeding (8)
- Bachelor Thesis (5)
- Book (5)
- Master's Thesis (1)
- Preprint (1)
- Report (1)
Has Fulltext
- yes (29)
Is part of the Bibliography
- no (29)
Keywords
This paper summarizes the results of a comprehensive statistical analysis on a corpus of open access articles and contained figures. It gives an insight into quantitative relationships between illustrations or types of illustrations, caption lengths, subjects, publishers, author affiliations, article citations and others.
Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal
(2016)
Multimedia objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier.
Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedia open access objects in order to improve the accessibility to this material.
With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try - frustratingly so, even if there is in fact such an image included in some open access article.
NOA is a search engine for scientific images from open access publications based on full text indexing of all text referring to the images and filtering for disciplines and image type. Images will be annotated with Wikipedia categories for better discoverability and for uploading to WikiCommons. Currently we have indexed approximately 2,7 Million images from over 710 000 scientific papers from all fields of science.
The NOA project collects and stores images from open access publications and makes them findable and reusable. During the project a focus group workshop was held to determine whether the development is addressing researchers’ needs. This took place before the second half of the project so that the results could be considered for further development since addressing users’ needs is a big part of the project. The focus was to find out what content and functionality they expect from image repositories.
In a first step, participants were asked to fill out a survey about their images use. Secondly, they tested different use cases on the live system. The first finding is that users have a need for finding scholarly images but it is not a routine task and they often do not know any image repositories. This is another reason for repositories to become more open and reach users by integrating with other content providers. The second finding is that users paid attention to image licenses but struggled to find and interpret them while also being unsure how to cite images. In general, there is a high demand for reusing scholarly images but the existing infrastructure has room to improve.
Scientific papers from all disciplines contain many abbreviations and acronyms. In many cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each acronym and thus can be applied to a large number of different acronyms with only few instances. We constructed a set of 19,954 examples of 4,365 ambiguous acronyms from image captions in scientific papers along with their contextually correct definition from different domains. We learn word embeddings for all words in the corpus and compare the averaged context vector of the words in the expansion of an acronym with the weighted average vector of the words in the context of the acronym. We show that this method clearly outperforms (classical) cosine similarity. Furthermore, we show that word embeddings learned from a 1 billion word corpus of scientific exts outperform word embeddings learned from much larger general corpora.
Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.
As a result of a research semester in the summer of 2022, a bibliography on multimodality in technical communication (TC) is presented. Given that TC primarily involves the development of instructional information, this bibliography holds relevance for anyone interested in the use of multimodality in the communication of procedural knowledge. The bibliography is publicly accessible as Zotero group library (https://bit.ly/multimodality_in_tc) and can be used and expanded.
After a description of the objectives and target group, the five disciplines from which the publications in the bibliography originate are presented. This is followed by information on the structure and search options of the Zotero group library, which are intended to support the search for publications on the respective research interest. The article concludes with some suggestions for collaborative efforts aimed at further enhancing and expanding the bibliography.
The author actively maintains the group library. Individuals seeking to contribute publications to the group library will receive the appropriate access rights from the author (claudia.villiger@hs-hannover.de). The author aspires to foster collaboration among researchers from diverse fields through this bibliography.
The reuse of scientific raw data is a key demand of Open Science. In the project NOA we foster reuse of scientific images by collecting and uploading them to Wikimedia Commons. In this paper we present a text-based annotation method that proposes Wikipedia categories for open access images. The assigned categories can be used for image retrieval or to upload images to Wikimedia Commons. The annotation basically consists of two phases: extracting salient keywords and mapping these keywords to categories. The results are evaluated on a small record of open access images that were manually annotated.
Wikidata and Wikibase as complementary research data management services for cultural heritage data
(2022)
The NFDI (German National Research Data Infrastructure) consortia are associations of various institutions within a specific research field, which work together to develop common data infrastructures, guidelines, best practices and tools that conform to the principles of FAIR data. Within the NFDI, a common question is: What is the potential of Wikidata to be used as an application for science and research? In this paper, we address this question by tracing current research usecases and applications for Wikidata, its relation to standalone Wikibase instances, and how the two can function as complementary services to meet a range of research needs. This paper builds on lessons learned through the development of open data projects and software services within the Open Science Lab at TIB, Hannover, in the context of NFDI4Culture – the consortium including participants across the broad spectrum of the digital libraries, archives, and museums field, and the digital humanities.
Concreteness of words has been studied extensively in psycholinguistic literature. A number of datasets have been created with average values for perceived concreteness of words. We show that we can train a regression model on these data, using word embeddings and morphological features, that can predict these concreteness values with high accuracy. We evaluate the model on 7 publicly available datasets. Only for a few small subsets of these datasets prediction of concreteness values are found in the literature. Our results clearly outperform the reported results for these datasets.