Feature Selection — Coefficient , Lasso Regularization

Linear Regression Coefficient

Linear regression predicts a quantitative response Y on the basis of predictor variables X1, X2, … Xn. It assumes that there is a linear relationship between X(s) and Y. Mathematically, we write this linear relationship as Y ≈ β0 + β1X1 + β2X2 + … + βnXn.

The magnitude of the coefficients is directly influenced by the scale of the features. Therefore, to compare coefficients across features, it is importance that all features are within a similar scale. This is why, normalization is important for variable importance and feature selection in linear models.

Indicative of the influence of the feature on the Outcome

There is a linear relationship

The feature is independent of each other , no correlation

Normally distributed

Scale of all features are the same , if not we need to standardize them (eg, Standard Scaler)


The higher the value of coefficient the more the influence the feature has and hence more important

1- imports

2- Apply Standard Scaler

3 — from sklearn.feature_selection import SelectFromModel

use the Logistic Regression as the classifier

4.sel_.get_support() will get the list of all the features selected

sel_.estimator_.coef_ gets you all the coefficient

5. Find the mean coefficient


np.abs(sel_.estimator_.coef_.mean() )



This is exactly the same as Logistic except we don’t add the penalty in LinearRegression()

1 Prepare the data , only include the numerical values , then train test split it

2 Implement the Selector

3 Get the selected features

4 Include all the features whose coefficients are higher than the mean coefficient


Lasso Regularization

Regularization consists in adding a penalty to the different parameters of the machine learning model to reduce the freedom of the model and avoid overfitting. In linear model regularization, the penalty is applied to the coefficients that multiply each of the predictors. The Lasso regularization or l1 has the property that is able to shrink some of the coefficients to zero. Therefore, those features can be removed from the model.

As we apply regularization Lasso shrinks to zero at different points so we can start to rank and remove those features that shrink to zero faster

We DON’T use l2 Ridge penalty in Feature Selection as it shrinks all the features together to zero at the same time

Classification Lasso Selection

l1 penalty = lasso

and none will be shrunk to 0 and hence no features will be removed

Regression Feature selection using Lasso

1- we need to prepare data only with numerics , remove all na values and train test split

2- Apply the SelectFromModel with Lasso and alpha =100

linear regression object from sklearn does not allow for regularization. So If you want to make a regularized linear regression you need to import specifically “Lasso” alpha is the penalization, set it high to force the algorithm to shrink some coefficients

get_support() gets you all the features selected

The number of features that shrunk to 0 were 4 and those were removed