Learn important Feature Scaling Techniques in Machine Learning | Feature Engineering Part - IV

Welcome to the wonderful series of Hands-on feature engineering. In this tutorial, we are going to discuss different feature scaling techniques and study their impact on our variables, as well as we will also discuss which technique is suitable to use according to the different problem statements. Read the main headings that we will study in this article.

hands-on feature engineering techniques for feature scaling

A brief introduction to Feature Scaling

Feature scaling is a data preprocessing or feature engineering technique used to normalize a particular feature to bring all the values in a common scale like 0 to 1 or -1 to 1.

πŸ‘‰ We use the feature scaling technique to make our model understand the values easily in a common scale to improve performance and present a well-generalized model.
πŸ‘‰ When the range of values is distinct like Price, Fees, salary, or any distinct values which lie in various ranges then we use the feature scaling technique to normalize the values to bring them in a particular range.

Why Feature Scaling Matters?

Feature scaling before modeling matters in almost most of the cases because of the following factors.

  • The scale of the variable directly influences the regression coefficients.
  • It is easy to reduce the computation time of the model and it also it makes easy for SVC or KNN to find the support vector or neighbors easily.
  • Gradient Descent converges faster if values of a feature are on a common scale.
  • Euclidian Distance is more dominant or sensitive to feature magnitude.

So, to improve the model performance and reduce its computation time feature scaling plays a vital role in part of the feature engineering process and preparing data that machine learning algorithms can easily understand.

Now let's get started and learn different feature scaling techniques along with standardization and normalization there are also some more techniques that are used in some of the cases.

Overview of Dataset

The dataset we will be using for learning and practicing each technique is the same as our previous tutorial which is a very famous Titanic Dataset.

Different Feature Scaling Techniques

We will be discussing the following feature scaling techniques that are widely used across different machine learning problem statements.
  • Standardization
  • Min Max Scaling(Normalization)
  • Mean Normalization
  • Robust Scaling(scale to median and IQR)
  • Scale to Absolute Maximum

1) Standardization

This is the most commonly used technique to scale down the features which measure the spread of value in the features. The technique centers the variable to 0 and standardizes the feature variance to 1.

    FORMULA                    X       =  X     -    Mean(X)     

The procedure for following this technique is subtracting the mean observation from each value and dividing it by the standard deviation of that feature. For applying this technique we have a pre-built function is a sciket-learn library and you can use it as given in below code snippet.

❗ Important Points to Remember

  • Centers the mean at 0 and variance at 1
  • It preserves the shape of the distribution
  • It is suggested that this technique should be used when a feature has a normal distribution.
  • It preserves the outliers if any are present.

2) Min - Max Scaler (Normalization Technique)

Also known as min-max normalization. It is the simplest technique that rescales the feature in the range of [0-1]  between the maximum and minimum range. The procedure for performing is very simple to subtract the minimum value from each observation and divide it by the difference between the maximum and minimum value.

   FORMULA                 X-bar  =       X - min(X)         
                                                           max(X) - min(X)           

We can implement this function easily and there is also a sciket-learn library with us to help us out.

❗ Important Points to Remember

  • It does not center the mean to 0
  • A variance of variable vary
  • It is best and suggested to apply this technique when a feature does not follow a normal distribution.
  • It is good to use this technique in distance-based algorithms.
  • While working with a neural network, data need to be scaled in the range [0-1] so normalization plays a very important role.

3) Mean Normalization

Mean normalization is a similar technique as min-max normalization. The only difference is that it rescales the variable between the range [-1, 1]. The procedure is to subtract the mean from each observation and divide by a difference of maximum and minimum.

   FORMULA               X-bar  =    X  -  Mean(X)       
                                                        max(X)  -  min(X) 

❗ Important Points to Remember

  • Scale variable between -1 to 1
  • May change the distribution of a variable
  • Preserves the outliers if exist

4) Robust Scaling

It is also as scaling by median because it does not center the mean, indeed it centers the median to 0. The procedure is simple as subtracting the median from each observation and dividing it by Interquartile range(IQR).

    IQR    =    75th Quantile - 25th Quantile 

   FORMULA             x-bar   =  X  - Median(x) 

❗ Important Points to Remember

  • Centers the median to 0
  • Maximum and minimum values may vary
  • It preserves the shape of the distribution of a variable
  • It is robust to outliers
  • scale down feature in a range of [-1, 1]

5) Scaling to Absolute Maximum

It is a pretty simple technique that scales down the feature in a range of -1 to 1 by simply dividing each observation by maximum value. It is a mostly used technique when you are working with sparse data means there is a huge number of zeroes present in your data then you can use this technique to scale the data.

   FORMULA            X-bar   =      X        

❗ Important Points to Remember

  • It does not affect the variance.
  • the resulting mean is not centered
  • sensitive to outliers.

πŸ‘‰The complete Kaggle Notebook for each technique to copy and practice can access from here. ✌


Feature scaling increases the training of your model but it is not useful to always use feature scaling. whenever you are having a distinct range of values or working with neural networks, image processing there is a compulsion or benefit to using feature scaling techniques.

I hope you are clear with each technique, and if anything is missing or do you have any new technique that should be included in a particular article then please revert it back. 

Thank You! 😊
Keep Learning, Happy Learning.


If you have any doubt or suggestions then, please let me know.

Previous Post Next Post