You can download this code by clicking the button below.
This code is now available for download.
This function takes a DataFrame and a split ratio as input, and splits the DataFrame into two subsets, with the first subset containing more data.
Technology Stack : Pandas, NumPy
Code Type : Data segmentation function
Code Difficulty : Intermediate
def random_split_dataframe(df, split_ratio):
"""
Splits a DataFrame into two subsets based on a given split ratio.
Args:
df (pandas.DataFrame): The DataFrame to split.
split_ratio (float): The ratio of the data to be included in the first subset.
Returns:
tuple: A tuple containing two DataFrames, the first with the larger subset and the second with the smaller subset.
"""
import numpy as np
# Calculate the number of rows for the first subset
num_rows = int(len(df) * split_ratio)
# Split the DataFrame
first_subset = df.iloc[:num_rows]
second_subset = df.iloc[num_rows:]
return first_subset, second_subset