Initialization - Introduction
- زمان مطالعه 0 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
این درس را میتوانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید
متن انگلیسی درس
Hey this is our first lesson about initialization which is a crucial part of machine learning when you
use clumsy or inappropriate methods.
Even the fastest computer in the world won’t be able to help you as they say the devil’s in the detail.
And this thing is as true as it gets for initialization.
OK initialization is the process in which we set the initial values of weights.
It was important to add to this section to the Course as an inappropriate initialization would cause
in an optimizer role model let’s revise what we have seen so far.
When we introduce the simplest gradient descent we use the function five times x squared plus three
times X minus four.
We perform the gradient descent in Excel and you should have that file available if you remember we
arbitrarily chose for as an initial value.
In other words we initialized a weight with a value of four the next occasion on which we stumbled upon
initial weights was our simple example.
Back then we initialize them randomly in the range minus 0.1 to 0.1.
If you haven’t asked yourself this question now is the time.
Does it really matter what the initial weights are.
Well we wouldn’t create a whole section about it if it did in in the minimal example we said we will
need random initial weights but we didn’t elaborate why.
OK you can see the same scheme we use here for our back propagation lessons.
This is a model with a single hidden layer.
Once initialize our weights and biases in such a way that they are equal to a constant.
It doesn’t matter which constant as you can see the three hidden units are completely symmetrical with
respect to the inputs.
Each hidden unit is a function of one wave coming from x 1 and one from x to.
If all the weights are equal there is no reason for the algorithm to learn that H1 H2 and H3 are different.
There is no reason for the algorithm to think that even our outputs are different.
Based on this symmetry back propagating all the weights are bound to be updated without distinguishing
between the different nodes in the net.
Some optimization would still take place so it won’t be the initial value.
Still the weights would remain useless.
Nice to know that.
So how are we supposed to initialize the weights then we’ll see this in our next lesson.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.