Danubius International Conferences, 11th International Conference The Danube - Axis of European Identity

Survey of Text Mining Research Methods and Their Innovative Applicability

Mihaela Chistol, Mirela Danubianu
Last modified: 2021-06-28

Abstract

Humans are social beings who feel a strong need for communication. From the earliest times, the exchange of information was based on primary skills such as sight and speech. Thus, at the beginning of the 20th century, a famous phrase was uttered that claims that “A picture is worth a thousand words”. In the contemporary world, this phrase is no longer appropriate because with the discovery of the WorldWideWeb the textual revolution began. While digitalization continues at light speed, the need to process huge amounts of generated text resources is felt even more strongly. Therefore to solve the crisis of information overload, text mining is used, which is a new and interesting area of computer science research. This paper presents a methodological and conceptual theory of text mining along with the main methods behind it. Following an in-depth examination of the literature, the study shows the fundamental directions of text mining research such as classification, clustering, and information retrieval. In addition, the article presents state-of-the-art applications that implement the concept of text mining to solve problems in the real world.

Acknowledgements: “The work of the first author is supported by the project ANTREPRENORDOC, in the framework of Human Resources Development Operational Programme 2014-2020, financed from the European Social Fund under the contract number 36355/23.05.2019 HRD OP /380/6/13 – SMIS Code: 123847. The work of the second author was carried out in the framework of the research project DREAM (Dynamics of the REsources and technological Advance in harvesting Marine renewable energy), supported by the Romanian Executive Agency for Higher Education, Research, Development and Innovation Funding – UEFISCDI, grant number PN-III-P4-ID- PCE-2020-0008.”