بیشتر در مورد باکسپلات
- زمان مطالعه 5 دقیقه
- سطح سخت
دانلود اپلیکیشن «زوم»
این درس را میتوانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید
متن انگلیسی درس
More on Boxplots
So in the last video we talked about the basics of the box plot. The box plot, shows in visual form those five important numbers. The minimum, the first quartile, the median, the third quartile, and the maximum. Now we can talk about why the box plot has that particular shape. Of course, the distinctive thing, is the box itself, that box right in the middle, the feature that gives this figure its name.
Yes, we need to show the positions of the five vertical lines. We talked about the importance of those last time. But why are the middle three connected by a box? What’s up with that? Think about the numbers that make up this box. The first quartile, the mean, and the third quartile.
Of course, 25% of the population is below the first quartile, and 25% is above the third quartile. So those are the two regions outside of the box. So outside of the box, we had 25 on one side, 25 on the other side. So that means the box itself, is 50%. And that’s 50% of the population between Q1 and Q3.
That is the middle 50% of the population. This is what the box on a boxplot represents. This can be a confusing idea, so let’s take this very slowly. Below the median is the lower 50% of the data, and above the median is the upper 50% of the data, but between the two quartiles, is the middle 50% of the data. The upper and lower 50% regions are likely to include outliers and extreme values.
The lower 50% goes all the way down to the minimum. The upper 50% goes all the way up to the maximum. Well, at the max and the min, those are where you’re likely to find outliers and extreme values. The middle 50%, those, that’s the middle 50% closest to the median, to the middle of the list.
That’s the 50% at the exact middle of the population. So that’s a very representative slice of the population that does not include any outliers at all. The size of this region, the size of the middle 50% given by Q3 minus Q1, is called the interquartile range or the IQR. So that is the range of the middle 50% of the data.
It’s also the horizontal length of the box, in a box plot. The advantage of this measure spread, is that it reflects the spread of the most typical values, not the extremes. Here’s a practice problem. Pause the video and then we’ll talk about this. Okay. In the above distribution, the total range of values, equals K times the interquartile range.
Find K. Well, fortunately we can read everything here. The minimum appears to be around 15. The max appears to be around 95. So 95 minus 15 is 80. That’s the range, max minus min.
Now the interquartile range, it looks like the first quartile is around 40, the third quartile is around 60. So the IQR, 60 minus 40 is 20. And so, K is how many times we’re multiplying 20 to get 80. 20 times K equals 80. So, of course, k is 80 divided by 20 equals 4.
And what we’re really saying is, that the whole length of this box plot from left side to right side, is about 4 times as long as the length of the box in the middle. Or in other words, the distance from the lowest score to the highest score. That total range, is four times larger, than the spread across the middle 50% of the data.
Now notice in the above population, I was saying 25% of the population is below Q1, and 50% is below the median. I was making these broad percentage statements. These statements are not always true because for example, in the set, and here we have a set with only a few values. The mean is five, Q1 is five, and only one number is below that value.
So the percentage statements certainly do not work. Remember that quartiles and boxplots are tools designed to make sense of a large population. Something the size of a whole country for example, hundreds of millions of data points. So, on that scale repetitions and other irregularities happen across too small a range to matter.
So yeah, if we look across the whole population, yeah, there are gonna be coincidences where maybe two different people have the same income, or something like that. But those small repetitions, they’re gonna be too small to matter, in the grand sweep of the whole population. On that scale, the percentage statements about median, the quartiles, and IQR are 100% perfectly accurate.
So the box of the boxplot is the middle 50% of the population, called the interquartile range. This middle 50% equ, the IQR equals Q3 minus Q1. The percent interpretations of median and the quartile works only if the data set is the size of a real population. So in other words, if you’re looking at a set that has ten members or something like that, of course the percentages here are not gonna work exactly.
They’re not supposed to. They’re supposed to work at a population level.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.