Random Text Generation Using Keras Tokenizer

  • Share this:

Code introduction


This function uses Keras' Tokenizer and pad_sequences methods to generate a random text of a specified length.


Technology Stack : Keras, Tokenizer, pad_sequences, NumPy

Code Type : Generate random text

Code Difficulty : Intermediate


                
                    
        def random_word_generator(length, top_k=100):
            from keras.preprocessing.text import Tokenizer
            from keras.preprocessing.sequence import pad_sequences
            import numpy as np
            
            # Generate a large dataset of words
            words = ["word" + str(i) for i in range(10000)]
            
            # Tokenize the words
            tokenizer = Tokenizer(num_words=top_k)
            tokenizer.fit_on_texts(words)
            
            # Generate a random word sequence
            word_sequence = np.random.randint(1, top_k+1, length)
            padded_sequence = pad_sequences([word_sequence], padding='post')
            
            return tokenizer.sequences_to_texts([word_sequence])[0]