Types of simple initializations

دوره: یادگیری عمیق با TensorFlow / فصل: Initialization / درس 2

Types of simple initializations

توضیح مختصر

  • زمان مطالعه 0 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

متن انگلیسی درس

Know that we know initialization matters let’s see how we can deal with it.

A simple approach would be to initialize weights randomly within a small range.

We did that in the minimal example we use the PI method random uniform and our range was between minus

0.1 and 0.1 this approach chooses the values randomly.

But in a uniform manner each one has the exact same probability of being chosen equal probability of

being selected.

Sounds intuitive but it is important to stress it.

You will soon see why really soon.

Here’s a second approach we could choose a normal initialiser.

The idea is basically the same.

This time though we pick the numbers from a zero mean normal distribution.

The chosen variance is arbitrary but should be small.

As you can guess since it follows the normal distribution values closer to 0 are much more likely to

be chosen than other values.

An example of such initialization is to draw from a normal distribution with a mean zero.

Any standard deviation 0.1 Both methods are somewhat problematic although they were the norm until 2010.

It was just recently that academics came up with a solution.

Let’s explore the problem.

Weights are used in linear combinations then the linear combinations are activated once more we will

use the sigmoid activator the sigmoid as other commonly used non-linearities is peculiar around its

mean and its Extreme’s activation functions take as inputs the linear combination of the units from

the previous layer right.

Well if the weights are too small this will cause values that fall around this range in this range.

Unfortunately the sigmoid is almost linear.

If all our inputs are in this range which will happen if we use small weights the sigmoid would not

apply nonlinearity but a linearity to the linear combination as we discussed non-linearities are essential

for deep nets.

Conversely if the values are too large or too small the sigmoid is almost flat.

Which cause is the output of the sigmoid to be only once or only zeros respectively a static output

of the activations minimises the gradient.

Well the algorithm is not really trained.

So what we want is a wide range of inputs for this sigmoid.

These inputs depend on the weights so the weights will have to be initialized in a reasonable range

so we have a nice variance along the linear combinations in the next lesson.

We will explore how to do that.

Thanks for watching.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.