Regularization Illustrated

Solomon
2 min readSep 19, 2022

--

This blog explains the effect of different regularization on data which is imbalanced and data which has noise.

Photo by Pixabay: https://www.pexels.com/photo/person-drawing-line-2097/

Support Vector Machine

The below diagram shows how SVM tries to classify the data for different ratios of imbalance and with 5 different regularization values.

Left to Right Plots Imbalance data of ration 100:2, 100:20,100:40.100:80

Each row corresponds to Regularization ‘C’ value 0.001,1,100,1000,2000

Model SVM

When C value is set to 2000 we can observe that even the heavily imbalance data (1st Column, Last row) is classified perfectly.

For other C Values, the margin is overlapped across positive and negative.

Logistic Regression

Left to Right Plots Imbalance data of ration 100:2, 100:20,100:40.100:80

Each row corresponds to Regularization ‘C’ value 0.001,1,100,1000,2000

Model Logistic Regression

We can observe the same effect for Logistic Regression as well.

Effect of Outliers vs Regularization

Data setup : For Illustration, Elliptical data is created and outliers are inserted at the below mentioned positions and we are going to observe how the model behave based on the outlier data present.

Left to Right Plots Outlier Positions — (0,2) (21,13) (-23,-15) (22,14) (23,14)

Each row corresponds to alpha value for SGD Regressor 0.0001,1,100

Model SGDRegressor

We can observe that the Hyperplane changes when we have outlier in the data for the ‘C’ values ‘0.0001’ and ‘1’ The Hyperplane specifically incline towards outlier data for ‘C’ values ‘0.0001’, ‘1’

For the C value ‘100’ ,the model does not have impact on the data, i.e the hyperplane wont change the classification parameters and behaves well with or without outlier data

Conclusion

This summarizes the regularization parameters have huge impact on the imbalance datasets and the data with noise which we encounter in all live data, hence importance should be given to fine tune all the hyperparameters to get the best model.

--

--

Solomon

Passionate about Data Science and applying Machine Learning,Deep Learning algorithms