Word Frequency Count with Stopword Removal

  • Share this:

Code introduction


This function takes a piece of text and a language code as input, and returns the word frequency count after removing stopwords from the text.


Technology Stack : collections, nltk.corpus.stopwords, nltk.tokenize.word_tokenize

Code Type : Function

Code Difficulty : Intermediate


                
                    
def random_word_count(text, language='en'):
    from collections import Counter
    from nltk.corpus import stopwords
    from nltk.tokenize import word_tokenize

    # Tokenize the text
    tokens = word_tokenize(text)
    
    # Filter out stopwords
    stop_words = set(stopwords.words(language))
    filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]
    
    # Count the words
    word_counts = Counter(filtered_tokens)
    
    return word_counts