Randomly Sample 50% of CSV Rows with Pandas

  • Share this:

Code introduction


This function uses the pandas library to randomly sample 50% of the rows from an input CSV file and saves the result to an output CSV file. It is used for data preprocessing and sampling.


Technology Stack : pandas, luigi

Code Type : The type of code

Code Difficulty : Intermediate


                
                    
import luigi
import random
import os
import pandas as pd

def generate_random_csv(input_file, output_file):
    """
    Generate a random CSV file from an input file using pandas.
    """
    df = pd.read_csv(input_file)
    random_df = df.sample(frac=0.5)
    random_df.to_csv(output_file, index=False)

# Code Explanation
# This function takes an input CSV file and an output CSV file as arguments.
# It reads the input CSV file into a pandas DataFrame.
# Then it randomly samples 50% of the rows from the DataFrame.
# Finally, it writes the sampled DataFrame to the output CSV file.

# Code Details                
              
Tags: