Splitting DataFrame into Subsets Based on Ratio

  • Share this:

Code introduction


This function takes a DataFrame and a split ratio as input, and splits the DataFrame into two subsets, with the first subset containing more data.


Technology Stack : Pandas, NumPy

Code Type : Data segmentation function

Code Difficulty : Intermediate


                
                    
def random_split_dataframe(df, split_ratio):
    """
    Splits a DataFrame into two subsets based on a given split ratio.
    
    Args:
        df (pandas.DataFrame): The DataFrame to split.
        split_ratio (float): The ratio of the data to be included in the first subset.
        
    Returns:
        tuple: A tuple containing two DataFrames, the first with the larger subset and the second with the smaller subset.
    """
    import numpy as np
    
    # Calculate the number of rows for the first subset
    num_rows = int(len(df) * split_ratio)
    
    # Split the DataFrame
    first_subset = df.iloc[:num_rows]
    second_subset = df.iloc[num_rows:]
    
    return first_subset, second_subset                
              
Tags: