# Softmax activation

/ / درس 6

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

### متن انگلیسی درس

Let’s continue exploring this table which contains mostly Greek letters we said the soft max function

has no definite graph y so while this function is different if we take a careful look at its formula

we would see the key difference between this function and the other is it takes an argument the whole

vector A instead of individual elements.

So the self max function is equal to the exponential of the element at position.

I divided by the sum of the exponentials of all elements of the vector.

So while the other activation functions get an input value and transform it regardless of the other

elements the SAAF Max considers the information about the whole set of numbers we have.

Time for an example.

Let A be equal to x w plus B which is our well known model.

Then we can say the output of Y will be the soft max of a.

Let’s look at a hidden layer with three units.

So a here is equal to h w plus B after transforming it through a linear combination we obtain a vector

with three elements minus 0.2 one 0.4 7 and 1.7 to.

Now if we used a different activation such as the sigmoid we would simply apply the formula for each

of the three numbers and we would obtain a new vector containing 3 new numbers.

But saaf Max is special.

Each element in the output depends on the entire set of elements of the input.

Let’s find the SAAF max of a first.

I’ll calculate the denominator.

It is given by E to the power of minus zero point to one plus E to the power of zero point for seven

plus to the power of 1.7 2.

That’s approximately 8.

Then we must divide each exponential by this denominator to get the new vector.

The result is 0.1 0.2 and 0.7.

This is our output layer OK.

A key aspect of the soft Max transformation is that the values it outputs are in the range from 0 to

1.

There is some is exactly 1 What else has such a property.

Probabilities Yes probabilities indeed.

The point of the soft Max transformation is to transform a bunch of arbitrarily large or small numbers

that come out of previous layers and fit them into a valid probability distribution.

This is extremely important and useful.

Remember our example with cats dogs and horses we saw earlier.

One photo was described by a vector containing 0.1 0.2 and 0.7.

We promise we will tell you how to do that.

Well that’s how through a soft Max transformation we kept our promise now that we know we are talking

about probabilities we can comfortably say we are 70 percent certain the image is a picture of a horse.

This makes everything so intuitive and useful that the SAAF next activation is often used as the activation

of the final output layer and classification problems.

So no matter what happens before the final output of the algorithm is a probability distribution.

All right this will do for now.

Thanks for watching.

### مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.