Interpretable ML 2: From Linear Idols to Non-Linear Frontiers: A Quest for Interpretability
In the realm of machine learning, linear models have long held a position of reverence, revered for their simplicity and ease of interpretation. However, these prized models, like gilded idols enshrined in a dimly lit cave, come with their own set of limitations and blind spots. As a group of data scientists gathered around the flickering light of a sacrificial fire, chanting their praises, an astute statistician named the Adventurer Statistician observed with a mixture of disbelief and concern. She knew that linear models, while useful for certain tasks, could lead to a flawed interpretation and detrimental decisions when applied blindly.
The Adventurer Statistician had ventured far beyond the confines of the linear world, exploring the uncharted territories of non-linear models. She had witnessed the power of these models to unlock hidden patterns and uncover complex relationships, but she also recognized their inherent challenges in terms of interpretability. She recognized that collinearity was not unique to LMs but rather an inherent challenge in machine learning. She understood that overcoming collinearity required a multifaceted approach, encompassing data preprocessing, model selection, and careful interpretation.
The Adventurer Statistician knew that the path to truly interpretable machine learning lay not in clinging to simplistic linear models but in embracing the full spectrum of machine learning techniques. She envisioned a world where data scientists, armed with a diverse toolkit of interpretability methods, could navigate the complexities of non-linear models, and extract meaningful insights.
With this vision in mind, she embarked on a quest to illuminate the hidden pathways of interpretable machine learning, venturing into the realm of gradient boosting machines (GBMs). GBMs, with their ability to capture intricate non-linear relationships, promised to revolutionize our understanding of complex systems. However, they also carried the risk of perpetuating the same interpretability challenges that plagued their linear counterparts.
Let’s follow her in this new venture!
👮 Enforcing Interactions and Absence of Interactions
Managing interactions proves to be an effective approach to enhancing the interpretability of GBMs. Interactions can be classified into two categories: univariate (such as x²) and multivariate (such as y*x²). In certain situations, it may be desirable to destroy the observed correlations between predictors, making your model to behave as if the predictors were independent. This simplification, however, comes at the cost of introducing a bias that deviates from reality, from the true data distribution.
Enforcing or excluding interactions between predictors in your model can be justified by theoretical grounds or practical considerations. However, relying solely on intuition, such as assuming “all else being equal,” can be misleading and lead to inaccurate conclusions. A multidimensional prediction hypersurface cannot be reduced to a simple “one-way” or “univariate” relationship between the target variable and individual predictors, as this implies statistical independence among predictors, which is rarely the case.
Use Cases:
- Theoretically Sounding: If there is a sound theoretical basis or rationale for enforcing specific interactions or excluding interactions, you can incorporate them to align with the underlying principles of your model.
- Legal Requirement: In certain domains or applications, legal or regulatory requirements may necessitate the inclusion or exclusion of specific interactions in the model.
- Intuition Verified by Analysis: If careful analysis and examination validate a specific intuition regarding interactions, and this intuition has only a marginal impact on model performance, you may choose to incorporate it.
- Complexity-Accuracy Trade-Off: Enforcing interactions or excluding interactions can introduce additional complexity to the model. Carefully consider the trade-off between complexity and model accuracy, ensuring that the added complexity yields significant improvements in performance.
Example:
fetch_california_housing(download_if_missing=False)
print("feature names:", cal_housing.feature_names)
print("data shape: ", cal_housing.data.shape)
print("description:")
print(cal_housing.DESCR)
house_df = pd.DataFrame(cal_housing.data)
house_df.columns = cal_housing.feature_names
# center around the mean value
house_df["target"] = cal_housing.target - np.mean(cal_housing.target)
# Filter out the large houses and "outliers", for illustration purpose
house_df = house_df[
(house_df.AveRooms < 10)
& (house_df.AveBedrms < 2)
& (house_df.AveOccup < 20)
& (house_df.Population < 10_000)
& (house_df.AveOccup < 6)
]
X_train, X_test, y_train, y_test =
train_test_split(house_df.drop(columns="target"), house_df.target, test_size=0.25, random_state=42)
singletons = [[i] for i in range(0, X_train.shape[1])]
# Create the model
# Note: interaction_constraints is not a parameter in LightGBM
lgb_no_int = LGBMRegressor(interaction_constraints=singletons)
# Fit the model
lgb_no_int.fit(X_train, y_train)
# Predict
y_pred = lgb_no_int.predict(X_test)
In the upcoming third episode on interpretability, I’ll explain how to verify that the desired correlation annihilation has indeed been achieved and that our model effectively behaves as if the predictors were uncoupled from one another.
👮 Monotonic Constraints
Another effective approach to controlling model behavior and enhancing interpretability is to enforce monotonicity constraints. This strategy involves imposing a hierarchical structure on the relationship between the target variable and the predictors, ensuring that the model’s predictions conform to intuitive expectations. However, it’s crucial to exercise caution when applying monotonicity constraints solely based on intuition, as this can lead to oversimplified assumptions that deviate from the true underlying relationships.
For instance, if it seems reasonable to assume that an individual with a higher credit card balance than another rejected customer who shares the same credit profile should also be rejected, you can enforce a monotonicity constraint based on this principle.
(Note: This example serves as an illustration of the concept and should not be construed as a universal guideline.)
monotone_constraints = [1 if col == "MedInc" else 0 for col in cal_housing.feature_names]
lgb_mon = LGBMRegressor(monotone_constraints=monotone_constraints)
lgb_mon.fit(X=X_train, y=y_train)
Examining monotonicity with respect to ‘MedInc’ Synthetic, newly generated, and dummy data, solely for illustrative purposes. This is an example of ‘something that should not be’
dum_new = pd.DataFrame(
{
"MedInc": np.linspace(1, 10, 100),
"HouseAge": 20 * np.ones(100),
"AveRooms": 5 * np.ones(100),
"AveBedrms": np.ones(100),
"Population": 1000 * np.ones(100),
"AveOccup": 3 * np.ones(100),
"Latitude": 34 * np.ones(100),
"Longitude": -118 * np.ones(100),
}
)
These artificial data points are inherently flawed due to their deviation from the original multivariate distribution, commonly referred to as the Data Generating Process (DGP). In most instances, we lack direct knowledge of this underlying distribution and can only approximate it through statistical techniques like parametric, semi-parametric, or non-parametric methods.
If the ‘MedInc’ (median income) variable were statistically independent of the other predictors, we could freely manipulate its values without affecting the remaining predictors. The data distribution could be decomposed into:
encompasses all predictors excluding ‘MedInc’. Nevertheless, this independence assumption contradicts reality, as ‘MedInc’ exhibits correlations with other variables in the dataset.
y_lgbmon = lgb_mon.predict(dum_new)
y_lgbnmon = lgb_nmon.predict(dum_new)
plt.plot(dum_new.MedInc, y_lgbmon, label="monotonic")
plt.plot(dum_new.MedInc, y_lgbnmon, label="non-mon")
plt.legend()
plt.title("Monotonic vs non-mon lightGBM")
plt.xlabel("MedInc")
plt.ylabel("Target");
We stand triumphant as we have successfully harnessed the monotonicity of our model.
Conclusion
As our journey through the realm of statistical modeling unfolds, we have delved into the intricacies of interactions and monotonicity, mastering these concepts like seasoned adventurers navigating uncharted territories. We’ve learned that interactions, like hidden treasures buried beneath layers of data, can reveal profound insights when unearthed. Similarly, monotonicity, akin to a compass guiding our path, ensures our models adhere to intuitive expectations.
By harnessing these newfound insights, we have empowered our GBMs to become more predictable, akin to trusty steeds carrying us through the unpredictable landscapes of data. These models, once prone to unexpected deviations, now faithfully mirror the intricate patterns concealed within our datasets.
With this newfound predictability, our GBMs embark on a new chapter, their prowess amplified by our understanding of interactions and monotonicity. We stand ready to tackle the challenges that lie ahead, armed with these potent statistical tools. May our quest for knowledge continue, illuminating the path towards a future where data-driven predictions reign supreme.