Artificial Intelligence: NLP related task

Goal: Perform a topic modelling analysis using LDA over a set of documents, and review results with your peers.

Tasks:

Choose a set of texts (corpus) that you want to analyse for topics
Clean the texts by removing formatting and/or HTML tags, ending up with raw text only
Load the texts into a .csv file where the first column is the ID of the document and the second column is its contents (raw text only)
Follow the tutorial at: https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html, performing the following steps using R:
1. Text preprocessing: remove stop words, perform lemmatization, tokenize
2. Model calculation: choose appropriate parameters and build LDA model
3. Visualization of results: display results using word clouds and topic distribution graphs

Share your results in the forum and comment on the results of at least one of your fellow participants.

Última alteração: domingo, 9 de junho de 2024 às 12:37