Adding Random Columns to Dask DataFrame

  • Share this:

Code introduction


This function accepts a Dask DataFrame and an integer to specify the number of new columns to add. The function first generates random data to fill the new columns and then adds these columns to the original DataFrame.


Technology Stack : Dask, Dask DataFrame, NumPy, Pandas

Code Type : The type of code

Code Difficulty : Intermediate


                
                    
import dask.dataframe as dd
import numpy as np
import pandas as pd

def randomize_dataframe_columns(df, num_new_columns):
    """
    This function adds random columns to a Dask DataFrame with specified number of new columns.
    """
    # Generate random data for the new columns
    random_data = np.random.rand(num_new_columns, df.shape[0])
    
    # Convert the random data to a Dask DataFrame
    new_columns_df = dd.from_array(random_data, columns=[f'new_col_{i}' for i in range(num_new_columns)])
    
    # Concatenate the new columns with the original DataFrame
    result_df = dd.concat([df, new_columns_df])
    
    return result_df