Refine
Document Type
- Conference Proceeding (3)
- Bachelor Thesis (1)
- Master's Thesis (1)
Has Fulltext
- yes (5)
Is part of the Bibliography
- no (5)
Keywords
- Klassifikation (5) (remove)
Integrating distributional and lexical information for semantic classification of words using MRMF
(2016)
Semantic classification of words using distributional features is usually based on the semantic similarity of words. We show on two different datasets that a trained classifier using the distributional features directly gives better results. We use Support Vector Machines (SVM) and Multirelational Matrix Factorization (MRMF) to train classifiers. Both give similar results. However, MRMF, that was not used for semantic classification with distributional features before, can easily be extended with more matrices containing more information from different sources on the same problem. We demonstrate the effectiveness of the novel approach by including information from WordNet. Thus we show, that MRMF provides an interesting approach for building semantic classifiers that (1) gives better results than unsupervised approaches based on vector similarity, (2) gives similar results as other supervised methods and (3) can naturally be extended with other sources of information in order to improve the results.
The CogALex-V Shared Task provides two datasets that consists of pairs of words along with a classification of their semantic relation. The dataset for the first task distinguishes only between related and unrelated, while the second data set distinguishes several types of semantic relations. A number of recent papers propose to construct a feature vector that represents a pair of words by applying a pairwise simple operation to all elements of the feature vector. Subsequently, the pairs can be classified by training any classification algorithm on these vectors. In the present paper we apply this method to the provided datasets. We see that the results are not better than from the given simple baseline. We conclude that the results of the investigated method are strongly depended on the type of data to which it is applied.
The amount of papers published yearly increases since decades. Libraries need to make these resources accessible and available with classification being an important aspect and part of this process. This paper analyzes prerequisites and possibilities of automatic classification of medical literature. We explain the selection, preprocessing and analysis of data consisting of catalogue datasets from the library of the Hanover Medical School, Lower Saxony, Germany. In the present study, 19,348 documents, represented by notations of library classification systems such as e.g. the Dewey Decimal Classification (DDC), were classified into 514 different classes from the National Library of Medicine (NLM) classification system. The algorithm used was k-nearest-neighbours (kNN). A correct classification rate of 55.7% could be achieved. To the best of our knowledge, this is not only the first research conducted towards the use of the NLM classification in automatic classification but also the first approach that exclusively considers already assigned notations from other
classification systems for this purpose.
Die allgemeine Digitalisierung und besonders die IT Branche in Hannover, stellen Arbeitgeber_innen und Arbeitnehmer_innen vor große Herausforderungen. Berufsbezeichnungen im IT Sektor zeichnen sich im Gegensatz zu klassischen Berufsfeldern nicht dadurch aus, dass sie vereinheitlicht sind. Unterschiedlichste Berufsbezeichnungen verlangen oftmals identische Kompetenzen. Die Kompetenzen und Fähigkeiten der Arbeitnehmer_innen stehen ebenso immer mehr im Fokus der Arbeitgeber_innen, wie die Bereitschaft der permanenten Weiterbildung.
Zielgebend der vorliegenden Abschlussarbeit ist eine Datenbasis für ein kompetenzbasiertes IT Tool zu liefern, welches den Anspruch hat, die bereits beschriebenen Herausforderungen zu analysieren und zu klassifizieren. Zunächst ist daher eine Klassifikation, der auf dem hannoverschen Jobmarkt gesuchten IT Kompetenzen, zu erstellen. Vorbereitend wird eine Marktanalyse angefertigt, die sowohl Jobsuchmaschinen auf ihre Kompetenzorientierung als auch IT Kompetenzklassifikationen untersucht. Die erstellte Klassifikation bildet anschließend die Grundlage für das Kompetenzmatching zwischen Klassifikation und den Kompetenzen, die hannoversche IT Studierende erlernen, um zu verdeutlichen, in welchen Kompetenzen Weiterbildungsbedarf besteht. Die entstandene Datenbasis wird in einer MySQL Datenbank präsentiert, um eine möglichst flexible Verwendung und Weiterentwicklung des Datenbestands zu ermöglichen.