Luxembourgish language; POS-Tagging; Topic Modeling; Sentiment Analysis; Text Preparation; XML-Database
Abstract :
[en] Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely in its infancy. While a rich variety of linguistic processing tools exist, especially for English, these software tools offer little scope for the Luxembourgish language. LuNa (a Tool for Luxembourgish National Corpus) is an Open Toolbox that allows researchers to annotate a text corpus written in Luxembourgish language and to build/query an annotated corpus. The aim of the paper is to demonstrate the components of the system and its usage for Machine Learning applications like Topic Modelling and Sentiment Detection. Overall, LuNa bases on a XML-database to store the data and to define the XML scheme, it offers a Graphical User Interface (GUI) for a linguistic data preparation such as tokenization, Part-Of-Speech tagging, and morphological analysis -- just to name a few.
Disciplines :
Languages & linguistics Computer science
Author, co-author :
SIRAJZADE, Joshgun ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
SCHOMMER, Christoph ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
External co-authors :
no
Language :
English
Title :
The LuNa Open Toolbox for the Luxembourgish Language
Publication date :
2019
Event name :
19th Industrial Conference on Data Mining, ICDM 2019
Event place :
New York, United States
Event date :
from 17-07-2019 to 21-07-2019
Audience :
International
Main work title :
Advances in Data Mining, Applications and Theoretical Aspects, Poster Proceedings 2019