•  
  •  
 

SMU Data Science Review

Abstract

In recent years, the adoption of complex machine learning algorithms, often perceived as “black box” models, has grown exponentially across various disciplines. However, the lack of understanding regarding how these models come to their predictions often fosters skepticism and mistrust. In response to the demand for transparency and interpretability, Explainable AI techniques, such as SHapley Additive exPlanations (SHAP), have emerged as powerful tools for comprehending and trusting these algorithms. However, SHAP has an exponential computational demand O( x2 ), where x is the number of features. This becomes increasingly problematic with the larger datasets standard in most industries. Many frameworks have aimed to reduce the computation time to address the tractability of SHAP but have achieved limited results. A recently released package, auto-shap, integrates multicore parallelization into the SHAP framework to leverage multiple CPU cores in the computation of marginal feature attributions. This study aims to benchmark this framework using a large dataset and a virtual machine(VM) to evaluate its viability in mitigating the computational load of SHAP calculations. The benchmarking focuses on feature expansion, row expansion, and core expansion across the 16 cores of the VM. Preliminary results show promising improvements: at 10 features a performance increase of blank percentage was observed, and the framework could handle the entire feature set of 54, whereas Kernel SHAP and Tree SHAP became intractable after 15 features and 100,000 observations.

Included in

Data Science Commons

Share

COinS