Randomize DataFrame Columns in Python

  • Share this:

Code introduction


This function takes a DataFrame and a number of columns to randomize. It randomly shuffles the specified number of columns and returns the modified DataFrame.


Technology Stack : pandas, numpy

Code Type : Custom function

Code Difficulty : Intermediate


                
                    
import pandas as pd
import numpy as np

def randomize_dataframe_columns(df, num_columns):
    """
    This function takes a DataFrame and a number of columns to randomize.
    It randomly shuffles the specified number of columns and returns the modified DataFrame.
    """
    if num_columns > df.shape[1]:
        raise ValueError("Number of columns to randomize exceeds the number of columns in the DataFrame.")
    
    columns_to_randomize = df.columns[:num_columns]
    df_randomized = df.copy()
    
    # Shuffle the specified columns
    df_randomized[columns_to_randomize] = df_randomized[columns_to_randomize].apply(np.random.permutation)
    
    return df_randomized                
              
Tags: