Random Feature Selection with XGBoost Importance

  • Share this:

Code introduction


This function randomly selects the most important features from the given dataset and returns their names and importance scores.


Technology Stack : XGBoost, NumPy

Code Type : Function

Code Difficulty : Intermediate


                
                    
import xgboost as xgb
import numpy as np

def random_xgb_feature_importance(data, label, num_features=5):
    """
    Selects random features from the dataset and returns their importance scores.
    """
    # Initialize the XGBoost classifier
    xgb_clf = xgb.XGBClassifier(use_label_encoder=False, eval_metric='mlogloss')
    
    # Fit the classifier to the data
    xgb_clf.fit(data, label)
    
    # Get the feature importances
    importances = xgb_clf.feature_importances_
    
    # Randomly select features based on their importances
    indices = np.argsort(importances)[::-1][:num_features]
    selected_features = [data.columns[i] for i in indices]
    
    # Return the selected features and their importances
    return selected_features, importances[indices]                
              
Tags: