# One-hot and binary encoding

/ فصل: Preprocessing / درس 5

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید ### فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

### متن انگلیسی درس

Hi and welcome back.

Let’s begin this lesson by introducing binary encoding.

We will start from the ordinal numbers we assigned earlier.

Bread is represented by the number one yogurt by the number two and muffin is designated with three

binary encoding implies we should turn these numbers into binary one in binary is 0 1.

So bread would be 0 1 2 and binary is 1 0.

So yogurt would be 1 0.

3 in binary is 1 1.

So Muffin’s would be 1 1.

The next step of the process is to divide these into different columns as if we were creating two new

variables for the first one.

Yogurt is one.

And muffins are one for the second variable.

Bread is one yogurt is zero.

And muffins are one.

We have differentiated between the three categories and have removed the order.

However there are still some implied correlations between them.

For instance bread and yogurt seem exactly the opposite of each other.

Its like we are saying whatever is bread is not yogurt and vice versa.

Even if this makes sense if we encode them in a different way this opposite correlation would be true

for muffins and yogurt but no longer for bread therefore buying their re-encoding proves problematic

but is a great improvement regarding the initial ordinal method.

All right finally we have the so called one high end coding one ha is very simple and widely adopted.

It consists of creating as many columns as there are possible values.

Here we have three products thus we need three columns or three variables.

Lets call them bread yogurt and muffins.

Is this product yogurt and is this product Muffin’s what it means yes.

Zero means no.

So for a product that is bread we will have one 00 for a product that is yogurt 0 1 0 and for a product

that is Muffin 001 this is very intuitive as a product can only be of one type at the same time.

Thus there will be only one value one and everything else will be zeroed.

This means the products are uncorrelated and unequivocal which is useful and usually works like a charm.

Many lessons ago we were talking about cats dogs and horses classification.

The target vectors there were one encoded.

So we had the same type of vectors.

There is one big problem with one high encoding though one encoding requires a lot of new variables.

For example Ikea offers around thousand products.

Do we want to include 12000 columns in our inputs.

Definitely not if we use binary the 12000 products would be represented by 16 columns only since the

12000 product would be written like this in binary.

This is exponentially lower than the 12000 columns we would need for one high encoding.

In such cases we must use binary even though that would introduce some unjustified correlations between

the products.

Clearly there is a tradeoff between binary and one encoding we would prefer one heart when we have a

few categories and binary when dealing with many categories.

All right.

That was all.

Thanks for watching.

### مشارکت کنندگان در این صفحه

تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.

🖊 شما نیز می‌توانید برای مشارکت در ترجمه‌ی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.