Refine
Document Type
- Conference Proceeding (3)
- Working Paper (1)
Language
- English (4) (remove)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4) (remove)
Keywords
- Information Retrieval (4) (remove)
Institute
We present a simple method to find topics in user reviews that accompany ratings for products or services. Standard topic analysis will perform sub-optimal on such data since the word distributions in the documents are not only determined by the topics but by the sentiment as well. We reduce the influence of the sentiment on the topic selection by adding two explicit topics, representing positive and negative sentiment. We evaluate the proposed method on a set of over 15,000 hospital reviews. We show that the proposed method, Latent Semantic Analysis with explicit word features, finds topics with a much smaller bias for sentiments than other similar methods.
We compare the effect of different text segmentation strategies on speech based passage retrieval of video. Passage retrieval has mainly been studied to improve document retrieval and to enable question answering. In these domains best results were obtained using passages defined by the paragraph structure of the source documents or by using arbitrary overlapping passages. For the retrieval of relevant passages in a video, using speech transcripts, no author defined segmentation is available. We compare retrieval results from 4 different types of segments based on the speech channel of the video: fixed length segments, a sliding window, semantically coherent segments and prosodic segments. We evaluated the methods on the corpus of the MediaEval 2011 Rich Speech Retrieval task. Our main conclusion is that the retrieval results highly depend on the right choice for the segment length. However, results using the segmentation into semantically coherent parts depend much less on the segment length. Especially, the quality of fixed length and sliding window segmentation drops fast when the segment length increases, while quality of the semantically coherent segments is much more stable. Thus, if coherent segments are defined, longer segments can be used and consequently less segments have to be considered at retrieval time.
Data and Information Science: Book of Abstracts at BOBCATSSS 2022 Hybrid Conference, 23rd - 25th of May 2022, Debrecen.
This year marks the 30th anniversary of the BOBCATSSS. The BOBCATSSS is an international, annual symposium designed for librarians and information professionals in a rapidly changing environment. Over the past 30 years, the conference has included exciting topics, great venues, interested guests and engaging presenters.
This year we would like to introduce the topics of the many papers presented in the Book of Abstracts for the first time in presence at the University of Debrecen and hybrid. The Book of Abstracts provides an overview of all presentations given at BOBCATSSS. Presentations are listed in alphabetical order by title and include speeches, Pecha Kuchas, posters and workshops.
The theme of BOBCATSSS is Data and Information Science. Data and information are the basis for decisions and processes in business, politics and science. Particularly important in the current era of digital transformation. This is exactly where this year's subthemes come in. They deal with data science, openness as well as institutional roles.
Image captions in scientific papers usually are complementary to the images. Consequently, the captions contain many terms that do not refer to concepts visible in the image. We conjecture that it is possible to distinguish between these two types of terms in an image caption by analysing the text only. To examine this, we evaluated different features. The dataset we used to compute tf.idf values, word embeddings and concreteness values contains over 700 000 scientific papers with over 4,6 million images. The evaluation was done with a manually annotated subset of 329 images. Additionally, we trained a support vector machine to predict whether a term is a likely visible or not. We show that concreteness of terms is a very important feature to identify terms in captions and context that refer to concepts visible in images.