You can download this code by clicking the button below.
This code is now available for download.
The function accepts a text and an integer n, returning the n most frequently occurring words in the text. It uses the Eli5 library for feature extraction and visualization.
Technology Stack : Eli5, scikit-learn (CountVectorizer), NumPy
Code Type : The type of code
Code Difficulty : Intermediate
def random_word_frequency(text, n=5):
from eli5 import feature_extraction
from eli5.formatters import table
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
# Split the text into words
words = text.split()
# Create a list of tuples (word, count)
word_counts = [(word, words.count(word)) for word in set(words)]
# Sort the list by frequency
sorted_word_counts = sorted(word_counts, key=lambda x: x[1], reverse=True)
# Extract the top n words
top_n_words = [word for word, count in sorted_word_counts[:n]]
# Create a CountVectorizer instance
vectorizer = CountVectorizer(vocabulary=top_n_words)
# Fit and transform the vectorizer to the top n words
X = vectorizer.fit_transform([' '.join(top_n_words)])
# Display the top n words and their frequencies
formatter = table.TableFormatter()
formatter.format(X, vectorizer, top_n_words)
return top_n_words