Types of simple initializations
- زمان مطالعه 0 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
این درس را میتوانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید
متن انگلیسی درس
Know that we know initialization matters let’s see how we can deal with it.
A simple approach would be to initialize weights randomly within a small range.
We did that in the minimal example we use the PI method random uniform and our range was between minus
0.1 and 0.1 this approach chooses the values randomly.
But in a uniform manner each one has the exact same probability of being chosen equal probability of
Sounds intuitive but it is important to stress it.
You will soon see why really soon.
Here’s a second approach we could choose a normal initialiser.
The idea is basically the same.
This time though we pick the numbers from a zero mean normal distribution.
The chosen variance is arbitrary but should be small.
As you can guess since it follows the normal distribution values closer to 0 are much more likely to
be chosen than other values.
An example of such initialization is to draw from a normal distribution with a mean zero.
Any standard deviation 0.1 Both methods are somewhat problematic although they were the norm until 2010.
It was just recently that academics came up with a solution.
Let’s explore the problem.
Weights are used in linear combinations then the linear combinations are activated once more we will
use the sigmoid activator the sigmoid as other commonly used non-linearities is peculiar around its
mean and its Extreme’s activation functions take as inputs the linear combination of the units from
the previous layer right.
Well if the weights are too small this will cause values that fall around this range in this range.
Unfortunately the sigmoid is almost linear.
If all our inputs are in this range which will happen if we use small weights the sigmoid would not
apply nonlinearity but a linearity to the linear combination as we discussed non-linearities are essential
for deep nets.
Conversely if the values are too large or too small the sigmoid is almost flat.
Which cause is the output of the sigmoid to be only once or only zeros respectively a static output
of the activations minimises the gradient.
Well the algorithm is not really trained.
So what we want is a wide range of inputs for this sigmoid.
These inputs depend on the weights so the weights will have to be initialized in a reasonable range
so we have a nice variance along the linear combinations in the next lesson.
We will explore how to do that.
Thanks for watching.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.