Shapely Value & Feature Importance

5 min readJul 7, 2021

A solution concept in cooperative game theory

A machine learning model is trained on a set of features (x1, x2, x3, … xn), and the model predicts an outcome when feature values are provided. Now the question is how do you explain why the model has predicted an outcome P, not Q? or Which feature value influencing the model to predict P over Q? The answer to these questions is ‘Feature Importance’ that helps to inspect a model by understanding the feature contributions towards the model’s prediction. The story will explain a popular solution concept used to find feature importances.

Shapely Value

The backbone of the feature attribution is a concept in cooperative game theory called “Shapely Value”. In a cooperative game ( a competition that involves players and each player contributes towards winning the competition), shapely value is a concept to find a unique distribution of a total reward generated by the involvement of these players. In a game theory, a game can be anything that involves participants, and each participant contributes towards reaching its goal. For example, running a business can be considered as a game where owners, co-owners, other staffs contribute towards generating revenue.

Let's imagine two entrepreneurs named Enp-1, and Enp-2 are running a startup, consider this as a cooperative game because both are cooperatively involved in the business. When the startup is managed only by Enp-1, the revenue generated is $25K. When only Enp-2 managed the operations in the startup, revenue generated is $50K. And when both of them cooperatively managed the operations then the revenue generated by the startup grows to $80K. The shapely value represents an allocation of revenue to the entrepreneurs depending on their marginal contributions.

Marginal Contribution represents the true contribution of a particular player when the player joins others in a game. It is the difference between the reward received by the game with the specific player and without that player.

Shapely values are derived from an individual's marginal contribution. More the contribution, the bigger the allocation of reward or shapely value.

Shapley value in feature importance

Shapely value provides a way to explain predictions of a machine learning model. A model is trained on a dataset with a set of features. The phenomenon Shapely value provides a way to compute the fair contribution of each feature in the model’s prediction. It helps to interpret which feature has a high influence on the prediction. Think a feature like a player in game theory and model’s prediction is the same as reward or earnings in a game.

Consider a model is trained on a feature set F. The marginal contribution of adding a feature at a time can be calculated similarly as in the case of a game theory. These marginal contributions(MC) are actually the difference between the model’s prediction with and without the feature. Shapely value or allocation unit can then be derived from these marginal contributions.

In the above computation of MC, features are added sequentially one at a time. What if the order of adding a feature is changed? for example what if feature X3 is included before the feature x1? The plot on the left shows marginal contributions when features are added in order x1, x2, and x3. The right plot shows the MCs when orders are changed. When orders are changed, it may change the marginal contributions. Hence All possible combinations of features and orders need to be considered when finding MCs and thereby Shapely values.

Local feature importance

By now we understand that MC is the difference between model’s prediction with and without a specific feature being included and a specific data instance is used for the prediction. This is referred to as ‘local feature importance’. Shapely value captures feature importance for a specific prediction case. shapely values differ for different sets of values of the corresponding features.

Dependency Plot

The plot helps to understand the local feature importance. In the plot, the x-axis is a particular feature, and the y-axis represents the shapely values for that feature.

Consider the plot in the left is a dependency plot where it shows the local feature importance summary of a feature ‘Hours per week’. The plot shows that for low values of feature ‘Hours per week’, the feature has low shapely values, which means the feature has a low impact on the model’s prediction when feature values are in the low range.

The plot on the left shows features importance scores when a single data instance (a row) is used for the model’s prediction. feature ‘Capital Gain’ has a high positive impact on the model’s output and feature ‘Age’ has a highly negative impact on the model’s prediction. It means both these features are important in the model. The plot on the right explains the average impact of each feature across the dataset. In the plot, feature ‘Age’ has the highest impact (positive and negative) on the model’s prediction.

Google cloud’s service ‘AI Explanations’ uses the shapely value theory to provide an explanation (feature attribution) of a model deployed in the google cloud’s AI Platform.

Shapely Value & Feature Importance

Shapely Value

Shapley value in feature importance

Local feature importance

Dependency Plot

Written by Nazrul Miya