# Dealing with categorical data

/ فصل: Preprocessing / درس 4

### توضیح مختصر

• زمان مطالعه 0 دقیقه
• سطح خیلی سخت

### دانلود اپلیکیشن «زوم»

این درس را می‌توانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید

### فایل ویدیویی

برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.

### متن انگلیسی درس

So far most of what we’ve seen were examples of numerical variables exchange rates trading volume security

prices and so on.

Often though we must deal with categorical data in short categorical data refers to groups or categories

such as our cat dog examples.

But the machine learning algorithm takes only numbers as values doesn’t it.

Therefore the question when working with categorical data is how to convert a cat category into a number

so we can input it into a model or output.

In the end obviously a different number should be associated with each category right or better a tense

or we’re getting closer.

Imagine our shop has three products bread yogurt and muffins.

Now how do we convert these categories to numbers.

A possible solution could be to enumerate them like this.

Bread equals one yogurt equals two muffins equals three.

Unfortunately this implies there is some order.

It’s like saying that a muffin is more than a yogurt which is more than bread.

If we instead had three prices one dollar \$2 and \$3 three times one dollar is equal to three dollars

using the same logic does it make any sense to you that three times bread equals one muffin.

There is another level of ambiguity to get from bread to muffins.

We always go through yogurt ultimately what we have done is assumed the data has some order while it

hasn’t.

Typically that’s an issue when our data is divided into categories.

Think about the products in a shop about different car brands or about people so our question becomes

how to encode such categories in a way which will be useful for a machine learning algorithm.