Adaptive moment estimation

دوره: یادگیری عمیق با TensorFlow / فصل: Gradient descent and learning rates / درس 7

Adaptive moment estimation

توضیح مختصر

  • زمان مطالعه 0 دقیقه
  • سطح خیلی سخت

دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

دانلود اپلیکیشن «زوم»

فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

متن انگلیسی درس

So far we saw two different optimizers each brought a new bright idea to the update rule.

It would be even better if we can combine these concepts and obtain even better results right.

OK let’s see how this is done.

If we take the learning rate schedules and the momentum we can reach a state of our optimization algorithm

that I find most useful.

It’s called the adaptive moment estimation.

It is the most advanced optimizer applied in practice.

It’s also new as it was proposed on the 22nd of December 2014.

Stated differently if somebody you know studied machine learning in university in 2014 or 2015 he probably

did not see this method.

This also reinforces the idea that your M-L preparation doesn’t stop after a course you take.

The trends are ever changing and you should stay informed and up to date.

Back to Adam.

Adam is the topic of this lesson and is short for adaptive moment estimation.

If you noticed the ADA grad and the R-Miss.

Propp did not include momentum.

Adam steps on our Oremus prop and introduces momentum into the equation.

So the update rule derived from Oremus prop changes from Delta W-why equals minus eata divided by g

at Epoque T plus epsilon times the gradient to Delta w o equals minus eata divided by g at EPOC T plus

epsilon times.

Am I at about T.

M is the momentum we discussed earlier but it is a bit transformed.

M at EPOC T equals Alpha times M at Epic.

T minus one plus one minus Alpha times the gradient naturally and not is equal to zero.

I always use ADOM as it is a cutting edge machine learning method.

Take a moment to appreciate what you’ve seen in the previous few lectures.

Starting from the basic gradient descent procedure we first extended it to the SAGD for computational

efficiency.

Then we introduced an adaptive learning rate scheduler to reach Ada.

Grad we further extended it by introducing a moving average for the scheduler in Oremus prop.

Finally we introduced momentum in the final expression for Adam of course.

As with all science data science is a long chain of academic research building on top of each other.

All right.

See you in our next lesson.

Thanks for watching.

مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.