NLP related task
Умови завершення
Переглянути
Goal: Perform a topic modelling analysis using LDA over a set of documents, and review results with your peers.
Tasks:
Share your results in the forum and comment on the results of at least one of your fellow participants.
Tasks:
- Choose a set of texts (corpus) that you want to analyse for topics
- Clean the texts by removing formatting and/or HTML tags, ending up with raw text only
- Load the texts into a .csv file where the first column is the ID of the document and the second column is its contents (raw text only)
- Follow the tutorial at: https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html, performing the following steps using R:
- Text preprocessing: remove stop words, perform lemmatization, tokenize
- Model calculation: choose appropriate parameters and build LDA model
- Visualization of results: display results using word clouds and topic distribution graphs
- Text preprocessing: remove stop words, perform lemmatization, tokenize
Share your results in the forum and comment on the results of at least one of your fellow participants.
Остання зміна: неділю 9 червня 2024 12:37 PM