You can download this code by clicking the button below.
This code is now available for download.
This function uses gensim's LDA model to perform topic modeling on a list of texts, returning a list of the most common words for each topic.
Technology Stack : gensim, numpy
Code Type : Text analysis
Code Difficulty : Intermediate
import gensim
import numpy as np
def topic_modeling(texts, num_topics=10, num_words=5):
"""
This function performs topic modeling on a list of texts using gensim's LDA (Latent Dirichlet Allocation) algorithm.
It returns the most common words for each topic.
"""
# Create a dictionary representation of the documents.
dictionary = gensim.corpora.Dictionary(texts)
# Create a Bag-of-Words (BoW) representation of the documents.
corpus = [dictionary.doc2bow(text) for text in texts]
# Train the LDA model.
lda_model = gensim.models.ldamodel.LdaModel(corpus,
id2word=dictionary,
num_topics=num_topics,
random_state=100,
update_every=1,
passes=10,
alpha='auto',
per_word_topics=True)
# Print the most common words for each topic.
topic_words = []
for idx, topic in lda_model.print_topics(-1):
print('Topic: {} \nWords: {}'.format(idx, topic))
topic_words.append([word[0] for word in topic[1][:num_words]])
return topic_words
# Example usage:
# texts = ["This is the first document.", "This document is the second document.", "And this is the third one.",
# "Is this the first document?", "This is the second document."]
# topic_modeling(texts)