# Minimal example - part 4

/ / درس 4

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید ### فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

### متن انگلیسی درس

High again it is time to train our model since this is an iterative problem.

We must create a loop which will apply our update rule and calculate the last function.

I’ll use a for loop with 100 iterations to complete this task.

Let’s see the game plan will follow at each iteration.

We will calculate the outputs and compare them to the targets through the last function.

We will print the last for each iteration so we know how the algorithm is doing.

Finally we will adjust the weights and biases to get a better fit of the data.

At the next iteration these updated weights and biases will provide different outputs.

Then the procedure will be repeated.

All right time to create the outputs.

They are given following the well no linear model equation.

The outputs are equal to the inputs times the weights Plus the biases multiplying major CS requires

the method.

I’ll use the number pi 1 so any speed of inputs and weights.

Now the dot product of the input times the weights is 1000 by two times to buy one.

So a 1000 by 1 matrix when we add the bias which is a scalar.

This means it is added to each element of the output matrix.

OK for simplicity let’s declare a variable called deltas which will record the difference between the

outputs and the targets.

We already introduce such variable in the gradient descent lecture deltas equals outputs minus targets.

That’s useful as it is a part of the update rule.

Then we must calculate the loss.

We said we will use half the L2 norm loss.

Python actually speaking deltas is a 1000 by one array.

We are interested in the sum of its terms squared.

Following the formula for the L2 norm loss there is a num PI method called sum which will allow us to

sum all the values in the array the L2 norm requires these values to be squared.

So the code looks like this.

And P does some of Delta squared.

We then divide the whole expression by two to get the elegant update rules from the gradient descent.

Let’s further augment the loss by dividing it by the number of observations we have.

This would give us the average loss per observation or the mean loss.

Similarily to the division by 2.

This does not change the logic of the last function.

It is still lower than some more accurate results that will be obtained.

This little improvement makes the learning independent of the number of observations instead of adjusting

the learning rate.

We adjust the loss that that’s valuable as the same learning rate should give us similar results for

both 1000 and 1 million observations.

Once again that’s something we’ll discuss in more detail later in the course.

We’ll print the last we’ve obtained each step.

That’s done as we want to keep an eye on whether it is decreasing as iterations are performed.

If it is decreasing our machine learning algorithm functions well.

Finally we must update the weights and biases so they are ready for the next iteration using the same

rescaling trick.

I’ll also reskill the deltas.

This is yet another way to make the algorithm more universal.

So the new variable is deltas underscored skilled and equals deltas divided by observations.

Let’s update the weights.

The new weights are equal to the old weights minus the learning rate times the dot product of the inputs

and the Deltas underscored scaled.

The shape of the weights is two by one the shape of the inputs is one thousand by two and that of the

Delta skilled is one thousand by one.

Obviously we cannot simply multiply the inputs and the deltas.

This is an issue that may arise occasionally due to the linear algebra involved to fix it.

We must transpose the inputs matrix using the object but the method.

Now the major C’s are compatible.

By 1000 times 1000 by one is equal to 2 by 1.

I’d like to spare an extra thought on that.

Often when dealing with matrices you find the correct way to code it through dimensionality checks and

compatability errors.

However transposing major C’s doesn’t affect the information they hold so we can do it freely.

All right let’s update the biases.

The new biases are equal to the old biases minus the learning rate times the sum of the deltas as explained

This is the entire algorithm.

Let’s recap what it does first it calculates the outputs forgiven weights and biases.

Second it calculates a loss function that compares the outputs to the targets.

Third it prints the loss.

So we can later analyze it and forth.

We update the weights and the bias is following the gradient descent methodology.

Let’s run the code.

What we get is a list of numbers that appears to be in descending order right.

These are the values of our average last function.

It started from a high value and at each iteration it became lower and lower until it reached a point

where it almost stopped changing.

This means we have minimized or almost minimize the loss function with respect to the weights and biases.

Therefore we have found a linear function that fits the model Well the weights and the biases are optimize.

But so are the outputs.

Since the optimization process has ended.

We can check these values here.

We observe the values from the last iteration of the for loop.

The one that gave us the lowest last function in the memory of the computer the weights biases and outputs

variables are optimized as of now.

Congratulations you learn how to create your first machine learning algorithm.

Still let’s spend an extra minute on that.

I’d like to print the weights and the bias’s the weights seem about right.

The bias is close to five as we wanted but not really.

That’s because we use too few iterations or an inappropriate learning rate.

Let’s rerun the code for the loop.

This will continue optimizing the algorithm for another hundred iterations.

We can see the bias improves when we increase the number of iterations.

We strongly encourage you to play around with the code and find the optimal number of iterations for

the problem.

Try different values for observations learning rate number of iterations maybe even initial range for

initializing the weights and biases cool.

Finally I’d like to show you the plot of the output at the last iteration against the targets.

The closer this plot is to a 45 degree line the closer the outputs are to the targets.

Obviously our model worked like a charm.

All right.

This was the last lesson from our first big topic from next on we will start with more complicated stuff.

If you have any doubts about your knowledge so far please revisit the lessons and make use of all the

extra resources available like coarse notes exercises and Jupiter notebooks in addition.

Feel free to post in the course Q and A section.

We love hearing from you oh and one more thing.

If you like the course so far.