Extracting N-Grams from Text Using Polyglot

  • Share this:

Code introduction


This function extracts n-grams from a given text. An n-gram is a sequence of n consecutive words. The function first creates a Polyglot text object and then uses the ngrams method of Polyglot to extract n-grams.


Technology Stack : Polyglot

Code Type : Function

Code Difficulty : Intermediate


                
                    
def extract_ngrams(text, n=2):
    """
    Extracts n-grams from a given text.
    
    Args:
    text (str): The text from which n-grams will be extracted.
    n (int): The number of words in each n-gram. Defaults to 2.
    
    Returns:
    list: A list of n-grams extracted from the text.
    """
    import polyglot.text as Text

    # Create a Polyglot text object
    polyglot_text = Text.Text(text)
    
    # Extract n-grams
    ngrams = polyglot_text.ngrams(n)
    
    return ngrams                
              
Tags: