Fairseq-Based Text Translation using Pre-Trained Model

2024-12-16 12:13:27 9 Views

Code introduction

This function uses the Fairseq library to load a pre-trained model and the target language dictionary, then encodes the input text into a format acceptable by the model, and finally uses the model to translate the text. The translated tokens are then decoded back into text format.

Technology Stack : Fairseq, PyTorch, Dictionary

Code Type : Machine translation

Code Difficulty : Intermediate

                
                    
import random
import torch
from fairseq.models import FairseqModel
from fairseq.data import Dictionary

def generate_random_translation(input_text, model_path, target_dict_path):
    """
    This function takes an input text, a model path, and a target dictionary path to generate a translation.
    """
    # Load the dictionary
    target_dict = Dictionary.load(target_dict_path)
    
    # Load the model
    model = FairseqModel.from_pretrained(model_path)
    
    # Encode the input text
    input_tokens = target_dict.encode_line(input_text, add_if_not_exist=False, append_eos=True)
    
    # Generate the translation
    with torch.no_grad():
        output_tokens = model.translate(torch.tensor(input_tokens).unsqueeze(0))
    
    # Decode the output tokens to text
    output_text = target_dict.decode_line(output_tokens[0], skip_special_tokens=True)
    
    return output_text