You can download this code by clicking the button below.
This code is now available for download.
This function reads a CSV file from a specified path and calculates the mean and median of the 'value' column. It uses Dask for big data processing to improve computational efficiency.
Technology Stack : Dask, NumPy, Pandas
Code Type : The type of code
Code Difficulty : Intermediate
import numpy as np
import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
def aggregate_data(file_path):
# Initialize Dask client
client = Client()
# Read the CSV file into a Dask DataFrame
df = dd.read_csv(file_path)
# Calculate the mean of a specific column
mean_value = df['value'].mean().compute()
# Calculate the median of a specific column
median_value = df['value'].median().compute()
# Return the results as a pandas DataFrame
result = pd.DataFrame({
'Mean': [mean_value],
'Median': [median_value]
})
return result