Learning rate schedules
دوره: یادگیری عمیق با TensorFlow / فصل: Gradient descent and learning rates / درس 4سرفصل های مهم
Learning rate schedules
توضیح مختصر
- زمان مطالعه 0 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
فایل ویدیویی
برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.
ترجمهی درس
متن انگلیسی درس
Hi again we introduce the concept of hyper parameters 20 or 30 lessons ago.
I’m sure you remember parameters or the weights and the biases hyper parameters are things like width
and depth of the algorithm the number of hidden units and the number of hidden layers it is up to us
to choose their values.
We mentioned we should play around with hyper parameters to find the best route for our algorithm and
data at hand this lesson will focus on another hyper parameter the learning rate eata.
What do we know so far.
It must be small enough so we gently descend through the last function instead of oscillating wildly
around the minimum and never reaching it or diverging to infinity.
It also had to be big enough so the optimization takes place in a reasonable amount of time.
In the Excel file we provided on the gradient descent for one parameter you can play around with the
learning rate.
Moreover In one exercise coming with the minimal example you had the same chance but for the linear
model OK we’re doing science here.
So these phases small enough and big enough are too vague a smart way to deal with choosing the proper
learning rate is adopting a so-called learning rate schedule learning rate schedules get the best of
both worlds.
Small enough and big enough.
The rationale is the following.
We start from a high initial learning rate.
This leads to faster training.
In this way we approach the minimum faster than we want to lower the rate gradually as training goes
around the end of the training.
We want a very small learning rate.
So we get an accurate solution.
How are learning schedules implemented in practice.
There are two basic ways to do that.
The simplest one is setting a pre-determined piecewise constant learning rate.
For example we can use a learning rate of 0.1 for the first five epochs then 0.01 for the next five
and 0.00 one until the end.
This causes the loss function to converge much faster to the minimum and will give us an accurate result.
However considering what we’ve learned so far this seems too simple to be the norm right.
Indeed it is crude as it requires us to know approximately how many Epic’s it will take the last to
converge still beginners may want to use it as it makes a great difference compared to the constant
learning rate OK a second much smarter approach is the exponential schedule the exponential schedule
is a much better alternative as it smoothly reduces or DKs the learning rate.
We usually start from a high value such as eata not equal to zero point one.
Then we update the learning rate at each epoch using the rule in this expression and is the current
epoch while C is a constant.
Here’s the sequence of learning rates that would follow for a C equal to 20.
There is no rule for the constant c but usually it should be the same order of magnitude as the number
of epochs needed to minimize the loss.
For example if we need 100 epochs values of c from 50 to 500 are all fine.
If we need 1000 values from 500 to 5000 are alright.
Usually we’ll need much less.
So a value of c around 20 or 30 works well.
However from my personal experience the exact value of c doesn’t matter as much.
What makes a big difference is the presence of the learning schedule itself.
C is also a hyper parameter.
As with all hyper parameters it may make a difference for your particular problem.
You can try different values of c and see if this affects the results you obtain.
It’s worth pointing out that all those cool new improvements such as learning rate schedules and momentum
come at a price we pay the price of increasing the number of hyper parameters for which we must pick
values.
Generally the rule of thumb values work well but bear in mind that for some specific problem of yours
they may not.
It’s always worth it to Explora several hyper parameter values before sticking with one OK.
This will do for now.
Thanks for watching.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.