Goal: Perform a topic modelling analysis using LDA over a set of documents, and review results with your peers.

Tasks:
  1. Choose a set of texts (corpus) that you want to analyse for topics
  2. Clean the texts by removing formatting and/or HTML tags, ending up with raw text only
  3. Load the texts into a .csv file where the first column is the ID of the document and the second column is its contents (raw text only)
  4. Follow the tutorial at: https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html, performing the following steps using R:
    1. Text preprocessing: remove stop words, perform lemmatization, tokenize
    2. Model calculation: choose appropriate parameters and build LDA model
    3. Visualization of results: display results using word clouds and topic distribution graphs

Share your results in the forum and comment on the results of at least one of your fellow participants.


Última alteração: domingo, 9 de junho de 2024 às 12:37