You can download this code by clicking the button below.
This code is now available for download.
This code defines a Luigi task that randomly selects files from Google Cloud Storage (GCS) and processes these files. The task first lists the file paths from GCS and then executes a processing function for each file.
Technology Stack : Luigi, GCSClient, GCSFile
Code Type : Luigi Task
Code Difficulty : Intermediate
import random
import luigi
from luigi.contrib.gcs import GCSClient, GCSFile
def random_file_path():
return f"bucket/{random.randint(1000, 9999)}/file_{random.randint(1000, 9999)}.txt"
def list_files():
client = GCSClient()
bucket_name = random.choice(client.list_buckets())
return client.list_objects(bucket_name, prefix=random_file_path())
def process_file(file_path):
return f"Processed {file_path}"
class MyTask(luigi.Task):
def requires(self):
return list_files()
def run(self):
for file_path in self.input():
print(process_file(file_path))
# Code Information