Random File Selection and Processing in GCS with Luigi

  • Share this:

Code introduction


This code defines a Luigi task that randomly selects files from Google Cloud Storage (GCS) and processes these files. The task first lists the file paths from GCS and then executes a processing function for each file.


Technology Stack : Luigi, GCSClient, GCSFile

Code Type : Luigi Task

Code Difficulty : Intermediate


                
                    
import random
import luigi
from luigi.contrib.gcs import GCSClient, GCSFile

def random_file_path():
    return f"bucket/{random.randint(1000, 9999)}/file_{random.randint(1000, 9999)}.txt"

def list_files():
    client = GCSClient()
    bucket_name = random.choice(client.list_buckets())
    return client.list_objects(bucket_name, prefix=random_file_path())

def process_file(file_path):
    return f"Processed {file_path}"

class MyTask(luigi.Task):
    def requires(self):
        return list_files()

    def run(self):
        for file_path in self.input():
            print(process_file(file_path))

# Code Information