Refine
Document Type
- Article (1)
- Conference Proceeding (1)
Language
- English (2)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Keywords
- User Generated Content (2) (remove)
We compare the effect of different segmentation strategies for passage retrieval of user generated internet video. We consider retrieval of passages for rather abstract and complex queries that go beyond finding a certain object or constellation of objects in the visual channel. Hence the retrieval methods have to rely heavily on the recognized speech. Passage retrieval has mainly been studied to improve document retrieval and to enable question answering. In these domains best results were obtained using passages defined by the paragraph structure of the source documents or by using arbitrary overlapping passages. For the retrieval of relevant passages in a video no author defined paragraph structure is available. We compare retrieval results from 5 different types of segments: segments defined by shot boundaries, prosodic segments, fixed length segments, a sliding window and semantically coherent segments based on speech transcripts. We evaluated the methods on the corpus of the MediaEval 2011 Rich Speech Retrieval task. Our main conclusions are (1) that fixed length and coherent segments are clearly superior to segments based on speaker turns or shot boundaries; (2) that the retrieval results highly depend on the right choice for the segment length; and (3) that results using the segmentation into semantically coherent parts depend much less on the segment length. Especially, the quality of fixed length and sliding window segmentation drops fast when the segment length increases, while quality of the semantically coherent segments is much more stable. Thus, if coherent segments are defined, longer segments can be used and consequently fewer segments have to be considered at retrieval time.
During the Corona-Pandemic, information (e.g. from the analysis of balance sheets and payment behavior) traditionally used for corporate credit risk analysis became less valuable because it represents only past circumstances. Therefore, the use of currently published data from social media platforms, which have shown to contain valuable information regarding the financial stability of companies, should be evaluated. In this data e. g. additional information from disappointed employees or customers can be present. In order to analyze in how far this data can improve the information base for corporate credit risk assessment, Twitter data regarding the ten greatest insolvencies of German companies in 2020 and solvent counterparts is analyzed in this paper. The results from t-tests show, that sentiment before the insolvencies is significantly worse than in the comparison group which is in alignment with previously conducted research endeavors. Furthermore, companies can be classified as prospectively solvent or insolvent with up to 70% accuracy by applying the k-nearest-neighbor algorithm to monthly aggregated sentiment scores. No significant differences in the number of Tweets for both groups can be proven, which is in contrast to findings from studies which were conducted before the Corona-Pandemic. The results can be utilized by practitioners and scientists in order to improve decision support systems in the domain of corporate credit risk analysis. From a scientific point of view, the results show, that the information asymmetry between lenders and borrowers in credit relationships, which are principals and agents according to the principal-agent-theory, can be reduced based on user generated content from social media platforms. In future studies, it should be evaluated in how far the data can be integrated in established processes for credit decision making. Furthermore, additional social media platforms as well as samples of companies should be analyzed. Lastly, the authenticity of user generated contend should be taken into account in order to ensure, that credit decisions rely on truthful information only.