Adding Random Columns to Dask DataFrame

  • Share this:

Code introduction


This function accepts a Dask DataFrame and the number of new columns to add. It then creates new random columns with random values using numpy and adds these columns to the original DataFrame.


Technology Stack : Dask, numpy, pandas

Code Type : Function

Code Difficulty : Intermediate


                
                    
import dask.dataframe as dd
import numpy as np
import pandas as pd

def randomize_dataframe_columns(df, num_new_columns):
    """
    This function takes a Dask DataFrame and adds a specified number of random
    columns with random values to it. The random values are generated using
    numpy.
    """
    new_columns = [f'new_col_{i}' for i in range(num_new_columns)]
    new_df = dd.from_array(np.random.rand(num_new_columns, df.shape[0]), columns=new_columns)
    return df.join(new_df)                
              
Tags: