You can download this code by clicking the button below.
This code is now available for download.
This function takes a piece of text and a language code as input, and returns the word frequency count after removing stopwords from the text.
Technology Stack : collections, nltk.corpus.stopwords, nltk.tokenize.word_tokenize
Code Type : Function
Code Difficulty : Intermediate
def random_word_count(text, language='en'):
from collections import Counter
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# Tokenize the text
tokens = word_tokenize(text)
# Filter out stopwords
stop_words = set(stopwords.words(language))
filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]
# Count the words
word_counts = Counter(filtered_tokens)
return word_counts