Random DataFrame Column Selection and Filtering

  • Share this:

Code introduction


This function randomly selects a specified number of columns from a given DataFrame and filters the rows such that only the rows with at least one non-null value in the selected columns are returned.


Technology Stack : pandas, numpy, random

Code Type : Data filtering

Code Difficulty : Intermediate


                
                    
import pandas as pd
import numpy as np
import random

def random_dataframe_column_filter(df, num_columns):
    """
    This function randomly selects a specified number of columns from a given dataframe and filters the rows
    such that only the rows with at least one non-null value in the selected columns are returned.
    """
    # Select random columns
    selected_columns = random.sample(df.columns, num_columns)
    
    # Filter rows where at least one of the selected columns has a non-null value
    filtered_df = df[selected_columns].dropna(axis=1, how='all')
    
    return filtered_df

# JSON explanation of the code