سرفصل های مهم
محدوده و انحراف استاندارد
توضیح مختصر
- زمان مطالعه 0 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
فایل ویدیویی
برای دسترسی به این محتوا بایستی اپلیکیشن زبانشناس را نصب کنید.
ترجمهی درس
متن انگلیسی درس
So far, we have discussed measures of center, mean, median, and mode.
These give us a rough idea of where the middle of a list is, or what the most typical members of the set are, are.
Often, we also want to know how far the numbers on the list are from each other.
That is how spread out they are.
Now you may wonder, why is this important?
Well, suppose we have this scenario, we have this town of 100 households and that has an average income of $100,000 per year.
And we’d ask, we could ask the question, what kind of town is this?
In general, are the people in this town rich or poor, what’s going on in the town, what’s the story of the town.
We might ask for example, how do those households in those town compare to the median household income in the United States, which is 51,900.
Well, scenario number one would be, all 100 households have exactly $100,000 in annual income.
And so that means that every single one is about double the median for the United States.
So everyone in this town, this would be a town of great prosperity.
This is a town where everyone is doing quite well.
So that is one story.
Here’s where a different story, scenario number two, 50 households have $40,000, which is below the median in the United States, and the other 50 have $160,000.
So, 50 of them are doing very well.
And then, the other 50 are really kind of struggling.
So you get this, this split, about half the town is struggling, and half the town is doing well.
That’s a very different story.
Now consider this story.
99 households have exactly $5,000 in annual income.
So that would be extreme poverty.
99 of the households in the town are living in abject poverty, and one household is making over $9 million a year.
So they still would have the same average, but notice that this is a very, very different story from the stories we’ve had before.
So, in other words, just knowing the measure of center, and not knowing anything about the spread, can drastically change the kind of town.
So all three of these scenarios have the same average, but the spread between the numbers becomes greater, and greater, from one to three.
Spread is an important factor in understanding the story of any set of numbers.
And, in fact, in the larger picture, this is exactly what statistics is about.
Statistics is trying to figure out the story of a set of numbers what does this set of numbers actually mean?
That’s what statistics is trying to and this is an example of it.
Just as we discussed measures of center, we will also discuss a couple measures of spread.
Now, both of these terms, measure of center, measure of spread, these are not terms you need to know for the test.
These are only terms we’re using in this video just to categorize the ideas that we’re discussing.
The first measure of spread that we’ll look at is the range.
Range is by far the simplest, and least sophisticated measure of spread.
The range is simply the max minus the min.
So, in other words, we take the highest number minus the lowest number, that’s the range.
This measure spread only tells us about the difference between the extreme highest value, and the extreme lowest value, it doesn’t tell us where most of the points are in between.
So, consider this, suppose we have a set of ten with a mean of 25, and a range of 20, and we wanna know what kind of story do we have here?
Well, story number one is we have most of the values are the mean, and we just have a couple outliers.
So, that’s one example.
Another scenario is that everyone’s an outlier.
You just have five numbers at the extreme low end, and five numbers at the extreme high end.
And then we could also have something like this, where most of the values are close together, but there’s one high outlier.
And you could image differently we could arrange it so most of the numbers are close together, but there’s one low outlier.
So there are a variety given different stories that we could have, given the same mean, and the same range.
Range is quite easy to calculate, but the information it gives us quite limited.
Notice that the range takes into account only two values, the max and the min.
A more sophisticated measure of spread would take into account every value, just as the mean takes into account every value.
And this more sophisticated measure of spread is, in fact, the standard deviation.
In order to discuss what standard deviation is, we need to discuss the idea of a deviation from the mean.
If we take any list and subtract the mean from every number on the list, we get a new list.
And this new list is the list of deviations from the mean.
So for example, here is a very simple list.
The mean is 5 subtract 5 from every number on the list.
And we get this new list, these are the deviations.
So 1 has a deviation of negative 4, because it is 4 units below 5.
7 has a deviation of 2, because it is 2 above 5.
Those are the deviations from the mean.
Notice that numbers below the mean have a negative deviation, numbers above the mean have a positive deviation.
Notice also, if we simply took a numerical average of the deviations, that average would always be zero.
So what we’d like is a typical size of the deviations, but we can’t simply average the deviation list, because that always gives zero.
Standard deviation is a way to measure the size of a typical deviation.
In other words, we are asking the question, how far away from the mean are the individual points?
The standard deviation is the best, it, it’s a representative answer to that question.
In fact, it’s the best single answer, we could give to that question.
I’ll talk a little more about that in later videos in this module.
But the standard deviation is fundamentally answering that question.
How far away from the mean are the individual data points?
What’s a typical distance from the mean?
The actual technical calculation for standard deviation is a bit complicated, and will be covered in the next lesson.
Here are some important facts about standard deviation, and if you know these facts, these quick and dirty facts.
You’ll know almost everything you need to know, about standard deviation for the test.
Fact number one.
The standard deviation can only be positive or zero, never negative.
It’s a distance after all, and distance can be positive, it can be 0, but it can’t be negative.
Fact number two, the only way standard deviation can equal zero, is if all the numbers on the list are identical to one another.
So let’s think about this.
Suppose we had a list like this.
All the numbers are equal.
Every entry equals 7, so of course the mean equals 7, and if we subtract 7 from every number on the list we’re just gonna get a set of zeros.
The de, the set of deviations is just a list of zeros.
So the standard deviation has to be zero.
A typical deviation is 0, so the standard deviation equals 0.
Fact number three this is a bit, a bit of a trickier one.
If all the numbers on the list are exactly the same distance from the mean.
That distance is the standard deviation.
So this could be a little harder to see.
But if we look at a list like this, here’s the mean, the mean is 5.
Notice that every number on that list is a distance of three away from the mean.
Two is three less than the mean, eight is three more than the mean, but on the number line every single number, if we measured the distance from that number to the mean, every single number would have the same distance, that distance is three, so if the standard deviation of the set has to be three.
Because every single entry is a distance of three away from the mean.
This is a rare fact that might be tested on the hardest quant problems.
This is a more applicable fact.
So this is not about a calculation, but just about comparing standard deviations.
A set with most numbers clustered toward the extremes, will have a higher standard deviation than a list with most values equal to or close to the mean.
So for example, suppose we’re looking at these two sets.
Notice that set number one, most of those numbers are at the mean.
Eight of the ten numbers are at the mean of the set, so most of them are clustered toward the center.
Set B, everything’s clustered toward the edges.
You have a bunch of very low values and a bunch of very high values.
Now for Set B with have a set, where every number is a distance of ten away from the standard deviation.
So for set B, we can actually figure out the standard deviation.
It has to be ten, because all the numbers are ten away from the mean.
Set A we don’t need to be able to calculate the standard deviation.
In fact, most times when you have to compare standard deviations, it’s not a matter of doing a calculation.
Instead we’re just going to simply notice B, most of the numbers are far away from the mean.
A, most of the numbers are close to the mean.
In fact, most of the numbers are equal to the mean.
And so that’s gonna mean that it’s going to have a much lower standard deviation, cuz most of those numbers are very close to the mean.
In fact, if we thought in terms of a list of deviations, eight of those ten numbers have a deviation of zero.
And so, having a list with mostly zeroes, that’s going to bring down the value of the standard deviation.
So we don’t need to calculate what it is, but we certainly know it is less than ten.
And again, when you’re asked to compare standard deviation, rarely are you asked to perform a calculation.
Fact number 5, if we add the same number to every number on the list, or subtract the same number from every number on the list, the standard deviation doesn’t change, so we look at these three lists.
Well, 1, 2, 3, 4, 5, 6.
We add 40 to every number in the list or we start with A, and we add 71 to every number on the list.
In either case, in all three cases, we just have a set of three consecutive integers.
So, those, those always have the spacing.
In other words, the steps are 1, 1, 1, 1, 1.
Those are the steps as we move across any of those five sets.
Any of those three sets.
So these three sets, and in fact, any set of six consecutive integers, have exactly the same deviation.
And again, calculating that deviation is not important.
What’s important is that you can compare these, and see immediately that they have to have the same standard deviation.
Similarly let’s start with this.
Now this is a, an asymmetrical set.
Notice the steps between the numbers it from two to three is one, from three to five is two, from five to eight is three.
From eight to 13 is five and from 13 to 21 is 8.
So the steps are 2, 3, 5, 8 and 13.
Set E, all we’ve done is we’ve added, the same we’ve added 30 to every number on the list.
So the spacings are still exactly the same, so this has the same standard deviation.
Now this one’s a little bit trickier.
It doesn’t actually look the same, but notice that the steps are in the reverse order.
That if we start at 78, then we’re stepping down by 1, then 2, then 3, then 5, then 8.
So we’re taking the steps in the reverse order but the same steps are still there.
In fact one way to think about set F.
If we subtract each number on D from 80, so 80 minus 2, 80 minus 3, 80 minus 5, that produces the numbers in F.
And all three of these have the same standard deviation.
Standard deviation has to do, has only to do with the space in between the numbers, not where they are on the number line.
Imagine the numbers of a set as dots on a number line.
We can slide that set of dots up and down and even reflect it.
As long as the space in between the dots stays the same, the standard deviation stays the same.
Start with these dots.
We slide them up.
Notice these spacings stay the same, we even reflect it.
But these spacings, now they go from right to left instead of left to right, but those spacings are still the same.
That’s what standard deviation is measuring, the spacing between the numbers.
Where they are on the number line or whether they go left to right, or right to left, does not matter at all.
Fact six, if we multiply every number on a list by positive number K, the standard deviation also gets multiplied by K.
So start out with this list here.
It has a mean, and we’ll just call it standard deviation Q, it doesn’t matter what it is, the actual value.
Now multiply everything on the list by three.
And notice what happened, when we multiplied by 3, the spacings get multiplied by 3.
Between 3 and 5 is a spacing of two, while between 9 and 15 is a spacing of 6.
Which is that spacing multiplied by 3.
Between 7 and 11, we have a spacing of 4.
Between 21 and 33, we have a spacing of 12.
That spacing, again, has been multiplied by 3.
Well the new mean gets multiplied by 3.
And also, the new standard deviation gets multiplied by three.
So if you multiply every number on a list by a positive number K, both the mean and the standard deviation get multiplied by that number.
Measures of spread tell us how far apart numbers on a list are from each other.
The range is the max minus the min.
If all the numbers on the list are identical, then the standard deviation is 0.
If all the numbers on the list are the same distance from the mean, then the standard deviation equals that distance.
Lots of points close to the mean give you a smaller standard deviation, lots of points far away from the mean give you a larger standard deviation.
If we add or subtract from every number on the list that doesn’t change the standard deviation at all.
And if we multiply a list by, by some positive number K, then the standard deviation gets multiplied by that number.
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.