Topic modeling using latent dirichlet allocation (LDA) on twitter data with Indonesia keyword
DOI:
https://doi.org/10.31763/businta.v5i2.455Keywords:
Classification, Text Mining, News Documents, Natural Language Processing, Latent Dirichlet Allocation (LDA)Abstract
Digital transformation causes an increase in the volume of information in the form of text such as news. On social media, a lot of news is uploaded in such a fast time and one of them is Twitter. Twitter is a social media service that has served many users, making it one of the social media that has very large data. From this very large data, it can be used as a news source for online news web. However, with the many topics extracted from Twitter data, the incoming data has a variety of topics which causes difficulties in identifying the topics from the data set taken and will require a lot of time if it has to be done manually by humans. Meanwhile, the data is potentially needed to provide information as quickly as possible. This study aims to classify topics on data taken from Twitter automatically so that it can make a classification on the news taken, can be more effective and efficient and does not take as much time as done manually by humans. The research was conducted using the Latent Dirichlet Allocation (LDA) method. News documents that will be classified are Indonesian news documents and will be classified into topics to be determined. The results of the research using topic modeling using the LDA method concluded that the number of topics formed from 9094 tweet data was 10 topics.