top of page

OBJECTIVE - 1 

Predict the Opioid-related death ratio in the US States based on socio-economic characteristics. 

Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors. It is hoped that the net effect will be to give estimates that are more reliable. [NCSS Statistical Software] 

  • In Ridge Regression, the OLS loss function is augmented in such a way that we not only minimize the sum of squared residuals but also penalize the size of parameter estimates, in order to shrink them towards zero: 

  • Solving this for β^ gives the ridge regression estimates, βˆridge=(X′X+λI)−1(X′Y), where I denote the identity matrix. 

  • The λ parameter is the regularization penalty. We will talk about how to choose it in the next sections of this tutorial, but for now notice that: 

    • As λ→ 0,  βˆridge  → β^OLS; 

    • As λ→∞, βˆ ridge  →0. 

Approach : Ridge Regression Model 

Reason 

As a first step, we check for the assumptions of the linear regression model and try to fit a linear regression model. But this model shows a High VIF value for most of the variables, which means a multicollinearity issue. And we need to consider the impact of different states in the death ratio. So, we move to a linear mixed model. In this model, most of the variables are not significant in predicting the death ratio also the random effect of states is also not significant. And the multicollinearity issue is still present in the model. Most of the variables show high multicollinearity. So, after removing those variables with high multicollinearity and tried another model. But still, there are some multicollinearity issues being present in the model. So as a next Approach to handle the multicollinearity issue, we move to Ridge Regression. 

As we have tried different models like Linear model, Linear Mixed Model, Ridge Regression, Ridge Random Effect Model, Ridge Regression (Removing Variables), Ridge Random Effect Model(Removing Variables) the best model we have obtained by comparing the AIC(Akaike information criterion:  an estimator of out-of-sample prediction error and thereby the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models). The table is shown below: 

Model
AIC Value
Adjusted R2
Ridge Random Effect Model(Removing Variables)
-1005.872
60.58
Ridge Regression (Removing Variables)
-1722.147
66.71
Ridge Random Effect Model
-1003.712
61.43
Ridge Regression
-1074.972
66.50
Linear Mixed Model
-1091.117
52.50

​From the above table, it’s clear that Ridge Regression (After Removing high VIF Variables), gives the best AIC score and a comparatively better R2 Value. So we decided to proceed with the Ridge Regression model. 

Output 

Ridge Regression 

Model Fitting ​

Picture1.png
  • We have tried Ridge Regression for resolving the multi-collinearity issue in the data.  

  • The lowest point in the curve indicates the optimal lambda: the log value of lambda that best minimized the error in cross-validation. 

​

Here the optimum lambda value is 0.0014 for this model. 

Ridge Regression after removing Multicollinearity variables

  • We tried to remove the variables having high VIF values (Women, VotingAgeCitizen), etc. 

  • Here most of the variables seem to be significant. 

Picture2.png

Model Validation 

  • We tried fitting the model and predict with test data. 

  • The RMSE score is 0.000373, which says the model is giving a good prediction for the test data. 

Code 

  • Availability of data, materials, and code are upon request.

Conclusion 

  • The model says that construction workers are more exposed to opioid deaths. They are the most using this opioid-related drug.  

  • Most men contribute slightly negatively to opioid deaths, as per the model. 

  • In terms of race, black people have less exposure to opioid deaths. 

  • Regarding educational qualification, people with only a high school graduation or Bachelor's degree have an impact on opioid-related deaths. 

  • People who are doing Work from home are less exposed to opioid deaths. 

bottom of page