Analyzing Model Performance with SHAP Values

  • Share this:

Code introduction


This code defines a function that uses the SHAP library to analyze the performance of a model on the training set and test set, including calculating SHAP values and the accuracy on the test set.


Technology Stack : SHAP, NumPy, Pandas, scikit-learn

Code Type : The type of code

Code Difficulty : Intermediate


                
                    
import numpy as np
import pandas as pd
import shap

def analyze_model_performance(model, X_train, y_train, X_test, y_test):
    # Train the model
    model.fit(X_train, y_train)
    
    # Compute SHAP values for the training set
    explainer = shap.TreeExplainer(model)
    shap_values_train = explainer.shap_values(X_train)
    
    # Compute SHAP values for the test set
    shap_values_test = explainer.shap_values(X_test)
    
    # Create a DataFrame to display SHAP values
    train_shap_df = pd.DataFrame(shap_values_train[0], index=X_train.columns, columns=['SHAP Values'])
    test_shap_df = pd.DataFrame(shap_values_test[0], index=X_test.columns, columns=['SHAP Values'])
    
    # Calculate the model's accuracy on the test set
    test_accuracy = model.score(X_test, y_test)
    
    return train_shap_df, test_shap_df, test_accuracy