Artificial Intelligence: LDA - Project Gutenberg - Moby-Dick

Introduction

In this post, I will share the results of a topic modeling analysis using Latent Dirichlet Allocation (LDA) on a corpus derived from The Project Gutenberg eBook of Moby-Dick.

Methodology

The analysis followed these key steps:

Text Collection: A sample of text from "The Project Gutenberg" in the Internet (https://gutenberg.org/files/2701/2701-0.txt) was compiled, encompassing various key events and characters from the series.
Text Preprocessing: The text was cleaned by converting to lowercase, removing punctuation, numbers, stopwords, and applying stemming.
Document-Term Matrix (DTM) Creation: The cleaned text was converted into a Document-Term Matrix.
LDA Model Fitting: The LDA model was fitted with 5 topics.
Visualization: The results were visualized using bar plots for the top terms of each topic, a word cloud for the entire corpus, and a bar plot showing the topic distribution across the corpus. An interactive view Interactive Visualization with LDAvis was also created.

Results

From here, I show the interactive graph where we can select the topics with Slide to adjust relevance metric:

Topic 1

Topic 2

Topic 3

Topic 4

Topic 5

Re: LDA - Project Gutenberg - Moby-Dick

von Joao Pereira - Samstag, 15. Juni 2024, 15:33

nice work

Re: LDA - Project Gutenberg - Moby-Dick

von Ana Guerreiro - Sonntag, 16. Juni 2024, 23:47

Great job

It is a complet analysis with extra analisys work

Regards,
Ana Guerreiro