# Xavier initialization

/ فصل: Initialization / درس 3

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

### متن انگلیسی درس

OK in this lesson we’ll continue talking about initializations a more advanced strategy is the Xavier

initialization also known as glore at initialization who are exeat.

Well that’s one person actually exit your Gorant.

He is the only academic I know of whose work is named after his first name rather than his last name.

Quite intriguing.

In any case Mr. Glaude or simply Xavier propose this method in 2010 and was quickly adopted on a large

scale.

That’s the state of the art technique.

So you probably want to get acquainted with it.

So the range for the first two cases was arbitrarily chosen by us right.

There are both a uniform Xavier initialization and a normal Xavier initialization.

The main idea is that the method used for randomization isn’t so important is the number of outputs

in the following where that does with the passing of each layer.

The Zeev your initialization is maintaining the variance in some bounds.

So we can take full advantage of activation functions.

There are two formulas.

The uniform is your initialization States.

We should draw each weight W from a random uniform distribution in the range from minus x to x where

x is equal to the square root of 6 divided by the number of inputs plus the number of outputs for the

transformation.

For the normal exeat your initialization we have draw each weight W from a normal distribution with

a mean of zero and a standard deviation equal to two divided by the number of inputs plus the number

of outputs for the transformation the numerator values 2 and 6 vary across sources.

But the idea is the same

Another detail you should notice is that the number of inputs and outputs matters outputs are clear.

That’s where the activation function is going.

So the higher number of outputs the higher need to spread weights would have no inputs.

Well optimization is done through back propagation.

So when we back propagate we would obviously have the same problem but in the opposite direction.

OK.

Finally in 10 sort of flow this is the default initialiser.

So if you initialize the variables without specifying how it will automatically adopt the exit of your

initializer unlike what we did in the minimal example.

Super interesting stuff.

Next time we will challenge the gradient descent.

Thanks for watching.

### مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.