Random Forest Feature Importance Calculation

2024-12-16 12:10:44 6 Views

Code introduction

This code first generates a random dataset, then trains a random forest classifier, and uses Eli5's PermutationImportance to calculate feature importances. Finally, it returns the mean importance of each feature.

Technology Stack : The code uses the packages and technologies such as numpy, pandas, eli5, sklearn, and random forest classifier.

Code Type : The type of code

Code Difficulty : Intermediate

                
                    
import numpy as np
import pandas as pd
from eli5.sklearn import PermutationImportance

def random_feature_importance(X, y):
    # Generate a random dataset
    rng = np.random.RandomState(42)
    X = pd.DataFrame(rng.rand(100, 20), columns=[f'feature_{i}' for i in range(20)])
    y = rng.randint(0, 2, 100)
    
    # Train a random classifier
    from sklearn.ensemble import RandomForestClassifier
    clf = RandomForestClassifier(n_estimators=10, random_state=42)
    clf.fit(X, y)
    
    # Use PermutationImportance to compute feature importances
    perm = PermutationImportance(clf, random_state=42).fit(X, y)
    
    # Return the mean importance of each feature
    importances = perm.feature_importances_.mean(axis=0)
    return importances