سرفصل های مهم
Standardization
توضیح مختصر
- زمان مطالعه 0 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
فایل ویدیویی
برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.
ترجمهی درس
متن انگلیسی درس
The most common problem when working with numerical data is about the difference in magnitudes.
As we mentioned in the first lesson an easy fix for this issue is standardization.
Other names by which you may have heard this term are features scaling and normalization.
However normalization could refer to a few additional concepts even within machine learning which is
why we will stick with the term standardization and feature scaling standardization or feature scaling
is the process of transforming the data we are working with into a standard scale a very common way
to approach this problem is by subtracting the mean and dividing by the standard deviation.
In this way regardless of the dataset we will always obtain a distribution with a mean of zero.
Any standard deviation of one which could easily be proven which show that with an ethics example say
our algorithm has two input variables Eurodollar exchange rate and the daily trading volume.
We have three days worth of observations.
First day 1.3 point three and one hundred ten thousand.
Second day one point three four and ninety eight thousand seven hundred and the third day.
One point to five and one hundred thirty five thousand.
The first value shows the euro dollar exchange rate while the second one shows the daily trading volume.
Standardize these figures.
We standardize the euro dollar exchange rates regarding the other euro dollar exchange rates.
So we look at 1.3 one point four and 1.5.
The mean is 1.3.
While the standard deviation zero point zero for five going through the above mentioned transformation
these values become 0.07 0.9 6 and minus one point zero three respectively
standardising trading volumes we obtain minus 0.25 minus 0.5 and one point one.
In this way we have focused figures of very different scales to appear similar.
That’s why another name for standardization is feature scaling.
This will ensure our linear combinations treat the two variables equally.
Also it is much easier to make sense of the data.
The transformation of trading volumes allowed us to transform the volumes from one hundred ten thousand
ninety eight thousand seven hundred and one hundred thirty five thousand to mine is 0.5 minus zero point
eight five and one point 1.1.
In this way the third term is considerably higher than the average.
While the first one is around the average we can confidently say that one hundred thirty five thousand
traits per day is a high figure.
While ninety eight thousand seven hundred is low please disregard the simplification of having just
three observations.
That’s just an example.
Besides standardization there are other popular methods too.
We will surely introduce them without going too much in detail.
Initially we said that normalization refers to several concepts.
One of them which comes up in machine learning often consists of converting each sample into a unit
length vector using the one or L2 norm another preprocessing method is PCA standing for principal components
analysis.
It is a dimensioned reduction technique often used when working with several variables referring to
the same bigger concept or latent variable.
For instance if we have data about one’s religion voting history participation in different associations
and upbringing we can combine these four to reflect his or her attitude towards immigration.
This new variable will normally be standardized in a range with a mean of zero and a standard deviation
of one whitening is another technique frequently used for pre-processing.
It is often performed after pca and removes most of the underlying correlations between data points
whitening can be useful when conceptually the data should be uncorrelated but that’s not reflected in
the observations.
We can’t cover all the strategies as each strategy is problem specific.
However standardization is the most common one and is the one we will employ in the practical examples
we will face in this course in the next lesson.
We will see how to deal with categorical data.
Thanks for watching.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.