Goal: Perform a topic modelling analysis using LDA over a set of documents, and review results with your peers.

Tasks:
  1. Choose a set of texts (corpus) that you want to analyse for topics
  2. Clean the texts by removing formatting and/or HTML tags, ending up with raw text only
  3. Load the texts into a .csv file where the first column is the ID of the document and the second column is its contents (raw text only)
  4. Follow the tutorial at: https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html, performing the following steps using R:
    1. Text preprocessing: remove stop words, perform lemmatization, tokenize
    2. Model calculation: choose appropriate parameters and build LDA model
    3. Visualization of results: display results using word clouds and topic distribution graphs

Share your results in the forum and comment on the results of at least one of your fellow participants.


Paskutinį kartą keista: sekmadienis, 2024 birželio 9, 12:37