Gradient descent pitfalls

دوره: یادگیری عمیق با TensorFlow / فصل: Gradient descent and learning rates / درس 2

Gradient descent pitfalls

توضیح مختصر

  • زمان مطالعه 0 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

متن انگلیسی درس

Hey I’ve been waiting for this lecture for almost the entire course.

It is of utmost importance but it would not have made sense to show it earlier.

So far we have seen both the gradient descent and the stochastic gradient descent.

They are the logical ways to train our models.

Let’s say this is the graph of our last function a gradient descent algorithm would begin here and start

descending a single batch G-d would be slow but eventually reached the minimum in a consistent manner.

A stochastic gradient descent algorithm would move through a greater number of points but much faster.

At the end it is likely we’ll get an approximate answer rather than the exact one.

Still as we said in the previous lecture with the saving in terms of computation speed it is well worth

the tradeoff.

In real life though loss functions are not so regular.

What if I told you this was not the whole graph of the last function.

It was just one of its minima.

A local imposter rather than the sought extremum zooming out.

We see that actually the global minimum of the loss is this point.

Each local minimum is a suboptimal solution to the machine learning optimization.

Gradient descent is prone to this issue.

Often it falls into the closest minimum to the starting point rather than the global minimum.

Of course it depends on the learning rate as well.

A higher learning rate may miss the first local minimum and fall directly into the global Velie.

However it is likely to oscillate and never reach it.

We’ll talk more about the learning rate in our next lessons.

For now let’s focus on the gradient descent.

So does this mean the gradient descent optimization method is not almighty.

Not necessarily remedies can be applied to reach the desired result.

Well if this lesson was about realizing this problem exists the next one will address the solution.

Stay tuned and thanks for watching.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.