You can download this code by clicking the button below.
This code is now available for download.
This function accepts a Dask DataFrame and the number of new columns to add. It then creates new random columns with random values using numpy and adds these columns to the original DataFrame.
Technology Stack : Dask, numpy, pandas
Code Type : Function
Code Difficulty : Intermediate
import dask.dataframe as dd
import numpy as np
import pandas as pd
def randomize_dataframe_columns(df, num_new_columns):
"""
This function takes a Dask DataFrame and adds a specified number of random
columns with random values to it. The random values are generated using
numpy.
"""
new_columns = [f'new_col_{i}' for i in range(num_new_columns)]
new_df = dd.from_array(np.random.rand(num_new_columns, df.shape[0]), columns=new_columns)
return df.join(new_df)