Standardization

/ فصل: Preprocessing / درس 3

توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

متن انگلیسی درس

The most common problem when working with numerical data is about the difference in magnitudes.

As we mentioned in the first lesson an easy fix for this issue is standardization.

Other names by which you may have heard this term are features scaling and normalization.

However normalization could refer to a few additional concepts even within machine learning which is

why we will stick with the term standardization and feature scaling standardization or feature scaling

is the process of transforming the data we are working with into a standard scale a very common way

to approach this problem is by subtracting the mean and dividing by the standard deviation.

In this way regardless of the dataset we will always obtain a distribution with a mean of zero.

Any standard deviation of one which could easily be proven which show that with an ethics example say

our algorithm has two input variables Eurodollar exchange rate and the daily trading volume.

We have three days worth of observations.

First day 1.3 point three and one hundred ten thousand.

Second day one point three four and ninety eight thousand seven hundred and the third day.

One point to five and one hundred thirty five thousand.

The first value shows the euro dollar exchange rate while the second one shows the daily trading volume.

Standardize these figures.

We standardize the euro dollar exchange rates regarding the other euro dollar exchange rates.

So we look at 1.3 one point four and 1.5.

The mean is 1.3.

While the standard deviation zero point zero for five going through the above mentioned transformation

these values become 0.07 0.9 6 and minus one point zero three respectively

standardising trading volumes we obtain minus 0.25 minus 0.5 and one point one.

In this way we have focused figures of very different scales to appear similar.

That’s why another name for standardization is feature scaling.

This will ensure our linear combinations treat the two variables equally.

Also it is much easier to make sense of the data.

The transformation of trading volumes allowed us to transform the volumes from one hundred ten thousand

ninety eight thousand seven hundred and one hundred thirty five thousand to mine is 0.5 minus zero point

eight five and one point 1.1.

In this way the third term is considerably higher than the average.

While the first one is around the average we can confidently say that one hundred thirty five thousand

traits per day is a high figure.

While ninety eight thousand seven hundred is low please disregard the simplification of having just

three observations.

That’s just an example.

Besides standardization there are other popular methods too.

We will surely introduce them without going too much in detail.

Initially we said that normalization refers to several concepts.

One of them which comes up in machine learning often consists of converting each sample into a unit

length vector using the one or L2 norm another preprocessing method is PCA standing for principal components

analysis.

It is a dimensioned reduction technique often used when working with several variables referring to

the same bigger concept or latent variable.

For instance if we have data about one’s religion voting history participation in different associations

and upbringing we can combine these four to reflect his or her attitude towards immigration.

This new variable will normally be standardized in a range with a mean of zero and a standard deviation

of one whitening is another technique frequently used for pre-processing.

It is often performed after pca and removes most of the underlying correlations between data points

whitening can be useful when conceptually the data should be uncorrelated but that’s not reflected in

the observations.

We can’t cover all the strategies as each strategy is problem specific.

However standardization is the most common one and is the one we will employ in the practical examples

we will face in this course in the next lesson.

We will see how to deal with categorical data.

Thanks for watching.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.