Refine
Document Type
- Working Paper (4) (remove)
Language
- English (4) (remove)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- Business Intelligence (1)
- Computerlinguistik (1)
- Computersicherheit (1)
- Data Mining (1)
- Distributional Semantics (1)
- Distributionelle Semantik (1)
- IT Sicherheit (1)
- IT security (1)
- Information Retrieval (1)
- Korpus <Linguistik> (1)
We compare the effect of different text segmentation strategies on speech based passage retrieval of video. Passage retrieval has mainly been studied to improve document retrieval and to enable question answering. In these domains best results were obtained using passages defined by the paragraph structure of the source documents or by using arbitrary overlapping passages. For the retrieval of relevant passages in a video, using speech transcripts, no author defined segmentation is available. We compare retrieval results from 4 different types of segments based on the speech channel of the video: fixed length segments, a sliding window, semantically coherent segments and prosodic segments. We evaluated the methods on the corpus of the MediaEval 2011 Rich Speech Retrieval task. Our main conclusion is that the retrieval results highly depend on the right choice for the segment length. However, results using the segmentation into semantically coherent parts depend much less on the segment length. Especially, the quality of fixed length and sliding window segmentation drops fast when the segment length increases, while quality of the semantically coherent segments is much more stable. Thus, if coherent segments are defined, longer segments can be used and consequently less segments have to be considered at retrieval time.
Distributional semantics tries to characterize the meaning of words by the contexts in which they occur. Similarity of words hence can be derived from the similarity of contexts. Contexts of a word are usually vectors of words appearing near to that word in a corpus. It was observed in previous research that similarity measures for the context vectors of two words depend on the frequency of these words. In the present paper we investigate this dependency in more detail for one similarity measure, the Jensen-Shannon divergence. We give an empirical model of this dependency and propose the deviation of the observed Jensen-Shannon divergence from the divergence expected on the basis of the frequencies of the words as an alternative similarity measure. We show that this new similarity measure is superior to both the Jensen-Shannon divergence and the cosine similarity in a task, in which pairs of words, taken from Wordnet, have to be classified as being synonyms or not.
Primary data is an important source ofinformation for Competitive Intelligence. Traditionally, it has been collected from interviews with stakeholders, talks at conferences and other means of direct interpersonal communication. The role of the Internet in the data collection – if it was used at all – was that of a provider of supplementary secondary data. Here, this approach is challenged and, using three examples of Social Media, it is shown that the Internet can and does provide valuable primary information to the Competitive Intelligence professional. Accordingly, a case is made for a shift of focus in the data collection process.
This document describes the work done during the Research Semester in Summer 2006 of Prof. Dr. Stefan Wohlfeil. It is about Security Management tasks and how these tasks might be supported by Open Source software tools. I begin with a short discussion of general management tasks and describe some additional, security related management tasks. These security related tasks should then be added to a software tool which already provides the general tasks. Nagios is such a tool. It is extended to also perform some of the security related management tasks, too. I describe the new checking scripts and how Nagios needs to be configured to use these scripts. The work has been done in cooperation with colleagues from the Polytech- nic of Namibia in Windhoek, Namibia. This opportunity was used to also establish a partnership between the Department of Computer Science at FH Hannover and the Department of Information Technology at the Polytechnic. A first Memorandum of Agreement lays the groundwork for future staff or student exchange.