Ernesto Rivera and María José Arteaga
1.0
2016-05-30
Shapley based R-Squared decomposition
C13 ning
This function provides the user with a decomposition of the R-Squared of a
linear regression by deriving the marginal contribution of each explanatory
variable included in the specification. We employ the methodology proposed by
Israeli, O. (2007) "A Shapley-based decomposition of the R-squared of a linear
regression". The Journal of Economic Inequality, 5(2), 199-212.
As explained by Israeli (2007), the methodology combines an inequality
decomposition procedure with the concept of Shapley value from game theory.
The procedure is as follows: the variance of the explained/dependent variable
may be decomposed into the contributions of each explanatory/independent variables
included in the specification, and that of the residual. It is therefore
possible to express the marginal contribution of each variable
as the difference of the R-Squared obtained with and without the inclusion of
the variable in the regression. Nevertheless, given the multiplicity of
explanatory variables in the specification, the marginal effect depends on
the elimination order. To address this issue, the contribution is given by the
average of all possible elimination sequences.
*Inputs
This function requires the following:
1) A list of the independent variables included in the specification.
A constant term is assumed to be present in the specification.
2) A series of the dependent variable.
Note: the user must be aware that the function will not work if the sample
contains any missing values.
*Output
The function returns a vector which includes the marginal effects
associated to each explanatory variable of the specification.
*Example
In order to obtain the marginal contribution to the R-Squared of an specification,
we propose the following example. Let us suppose a linear regression between
the money stock (M2) as a dependent variable and real GNP (rgnp) and the
interest rate (interest) as independent variables, based on Nelson and
Plosser's database.
open np.gdt
smpl 1909 1970
list x=rgnp interest
Shapley(x,money)
The function then returns the following:
R-Squared of linear regression:
r2 = 0.98405857
Marginal contribution of each independent variable to the R-Squared of a linear regression:
C_t (2 x 1)
0.93034
0.053719
which may be interpreted as the contribution of each independent variable to
the goodness of fit of the model explaining money stock.
That is, from the 0.98 R-Squared obtained in the regression, 0.93 is due to the
individual contribution of real GNP and 0.05 is due to the interest rate.
Independent variables
Dependent variable
```
matrix y= {Y}
matrix X= {x}
scalar T=rows(X) #number of observations
scalar N=cols(X) #number of independent variables
scalar M=2^N #number of possible regressions
scalar M2=M-1
matrix Z = zeros(M,1)
matrix C = zeros(1,N)
scalar mv_Y=sum(missing(Y))
scalar mv_x=sum(missing(x))
if mv_Y=0 && mv_x=0
loop k=0..M2 --quiet
matrix Xs=X
loop i=1..N --quiet
k2=N-i+1
if floor((k%(2^k2))/2^(k2-1))==0
matrix Xs=DCM(Xs,k2)
endif
endloop
matrix Z[k+1]=R2(ones(T,1)~Xs,y)
endloop
loop j=1..N --quiet
loop m=0..M2 --quiet
if floor(m%(2^j)/2^(j-1))==0
matrix C[j]=C[j]+Z[m+1]
else
matrix C[j]=C[j]-Z[m+1]
endif
endloop
matrix C[j]=-C[j]/(2^(N-1))
endloop
print "R-Squared of linear regression:"
ols Y 0 x --quiet
scalar r2 = $rsq
print r2
print "Marginal contribution of each independent variable to the R-Squared of a linear regression:"
C_t=transp(C)
print C_t
return C_t
else
funcerr "Error: x and Y must not have missing values. Please define a sub-sample with no missing values."
endif
```

Inicial matrix
Column to be dropped
```
scalar T=rows(X)
scalar N=cols(X)
if N>1
if i==1
matrix Y=X[,2:N]
else
if i==N
matrix Y=X[,1:N-1]
else
matrix Y=X[,1:i-1]~X[,i+1:N]
endif
endif
else
matrix Y={}
endif
return Y
```

Independent variables
Dependent variable
```
scalar T=rows(y)
matrix betas=invpd(X'*X)*(X'*y)
matrix u = y-X*betas
scalar SSR = u'*u
scalar meany=mean(y)
scalar SST = (y-meany)'*(y-meany)
scalar R2=1-(SSR/SST)
return R2
```

include ShapleyCont.gfn
open np.gdt
smpl 1909 1970
list x=rgnp interest
Shapley(x,money)