# Preprocess the data - create a validation dataset and scale the data

/ / درس 4

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید ### متن انگلیسی درس

Welcome back.

It’s time to extract the train data and the test data.

Lucky for us there are built in references that will help us achieve this so Amnesty train comma M.A.

test equal amnesty data set square brackets train comma amnesty data set square brackets test but where

is the validation data set you may ask.

Don’t worry there’s an explanation.

By default tensor flows Amnesty has training and testing datasets but no validation data sets.

Sure that’s one of the more irritating properties of the tensor flow data sets module but in fact it

gives us the opportunity to actually practice splitting datasets on our own.

Let’s do it.

As you can see the train data set is much bigger than the test one.

So we’ll take our validation data from the train data set.

The easiest way to do that take an arbitrary percentage of the train data set to serve as validation.

So let’s take 10 percent of it.

We should start by setting the number of validation samples.

Therefore num validation samples equals zero point one times the number of training samples we can either

count the number of training samples or we can use the amnesty info variable we created earlier.

I go for the latter as it is readily available information we can extract the number of samples by writing

amnesty info splits train dot num examples OK so we will get a number equal to the number of training

samples divided by ten right.

We are not sure that this will be an integer though it may be 1 0 0 0 zero point two which is not really

a possible number of validation samples to solve this issue effortlessly we can override the number

of validation samples variable with T F cast number of validation samples TAF int 64.

This will cast the value of stored in the number of validation samples variable to an integer thereby

preventing any potential issues.

Great.

Now let’s also store the number of test samples and a dedicated variable.

Note that we’ve got them in Amnesty info.

We can use the same approach and write num test samples equals amnesty info splits test num examples.

Then once more we cast it to I.A.

64 all right.

Normally we would also like to scale our data in some way to make the result more numerically stable.

In this case we will simply prefer to have inputs between 0 and 1.

With that said let’s define a function that will scale the inputs called scale.

It will take an amnesty image and its label so def scale with parameters image and label as a precaution.

Let’s make sure all values are floats so we will cast the image local variable to a float 32.

Next we’ll proceed by scaling it as we already discussed the amnesty images contain values from zero

to two hundred and fifty five representing the two hundred and fifty six shades of gray.

Therefore if we divide each element by two hundred and fifty five we’ll get the desired result.

All elements will be between 0 and 1.

Therefore image divided by equals two hundred and fifty five dot the dot at the end once again signifies

that we want a result to be a float.

Finally we have to return the image and the original label.

So this was a very specific function to write right.

In fact there is a tensor flow method called Map which allows us to apply a custom transformation to

a given dataset.

Moreover this map can only apply transformations that can take an input and a label and return an input

and a label.

That’s what we build our scale function this way.

Note that you can scale your data in other ways if you see fit.

Just make sure that the function takes image and label and returns image and label.

Thus you are simply transforming the values OK.

So how do we implement this for our problem.

We’ve already decided we will take the validation data from Amnesty train so scale train and validation

data equals M ness train dot map of scale.

This will scale the whole train dataset and store it in our new variable.

Good job for homework.

You can try scaling the test data so it has the same scale as the train and validation data.

Let’s wrap it up here and continue with the pre processing in the next lesson.

Thanks for watching.

### مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.