قطعه های پراکنده
- زمان مطالعه 10 دقیقه
- سطح خیلی سخت
دانلود اپلیکیشن «زوم»
این درس را میتوانید به بهترین شکل و با امکانات عالی در اپلیکیشن «زوم» بخوانید
متن انگلیسی درس
Now we can talk about scatterplots. When each item in the population can be classified simultaneously by two different numerical variables, this is a very easy situation to represent graphically. Suppose for a set of a houses in a particular neighborhood, we measured both the price of the house and the area, square feet of the house: two numerical measurements for each house.
Then what we could do, we could make our graph like this and it would be very easy to see the relationship. So here, every dot is a particular house, the horizontal position of the dot indicates its area in square feet. The vertical position of the dot indicates its price. And so if we know the vert, if we know the price of the house, we know the area in square feet, we can place that, that house on this graph.
Notice interestingly that the price starts at 0 and goes 0, one-half, 1, 1 and one-half, 2. It starts at 0, but the area doesn’t start at 0. The area It actually starts at 500, goes 500, 1000, so it has equal intervals, it just doesn’t start at 0. So as we might expect, as area goes up, price tends to go up.
This pattern is known as a positive correlation; when one variable increases, the other tends to increase. And by this increase, we mean that if you look at the sweep of the whole population, there’s this kind of upward trend of this, the whole massive data points. That’s what constitutes a positive correlation on average, as area increases, then the price also increases.
It’s kind of an average statement, or a probabilistic statement. It’s very important to notice, a correlation is about an overall pattern in the dataset as whole. Correlation in the whole data set does not mean that the pattern will be obeyed between every possible pair of points in the set. In fact, it’s very easy to find a pair, pairs of individuals that don’t obey the general pattern.
So the general pattern is area increases, so price increases. But suppose I look at that house, and that house, well as area increases the price decreases, or if I look at that house and that house, or if I look at this house and this house. So it’s very easy to find individual pairs that don’t obey the pattern, that’s not what a correlation is about.
A correlation is not about nit picking and looking for individual pairs and the relationship between these pairs. It’s about a general pattern in the population as a whole. That’s what a correlation is. In general, the two axes will rep, two variables will be represented on the two axes.
Any variable may or may not start from a zero value. As we saw here, one of the variables did not start from zero. Each individual or item will be represented as a single dot and its position will be determined by the value of each variable. So here, every house was a single dot and its variable was determined by if, by the price and by its area in square feet.
Questions may ask about regions of the graph, that is, how many points are above or below such-and-such a value of one variable or the other variable? So, for example, how many houses costs more than $2 million? So we’d be looking at $2 million, and looking at the houses that lie above this line. And these points here in purple, these are the ones that are either at $2 million or a little bit more, and so there are six houses on this graph that cost more than $2 million.
Alternately, we could ask how many houses have an area less than 1,500 square feet? Well, here we would draw this vertical line. And we’d be looking to the left of this vertical line. And of course, it would be these houses here. So that’s seven houses that have an area less than 1,500 square feet. Here’s a practice question, pause the video, and then we’ll talk about this.
Okay, how many houses have an area of more than 2,000 square feet but cost less than $2 million? So here’s 2,000 square feet. They have to be over here and here is $2 million. So we’re gonna say that this point here is either at or a little above 2 million so we’re not gonna count that.
And, it’s only one, two, three, four. Those are the points that fit this criteria. There are four houses that have an area of more than 2 million square feet, but cost less than $2 million. Some of the questions just involve counting the number of dots in certain rectangles of the graph.
The test also asks about correlation. A positive correlation, as we said, is the pattern in which, as one variable increases, on average the other also increases. A negative correlation is a pattern in which, as one variable increases, on average the other decreases. So here are some positive correlations.
All three of these demonstrate a positive correlation. The correlation strength varies from strong to weak, going left to right. The leftmost is near-perfect. So a straight line, all the points lying on a single straight line, that would be a perfect correlation. So what we have on that left graph, it’s not perfect but it’s near perfect.
It’s a very, very strong correlation. The middle one is relatively strong, so there’s very small variation from the line. Mostly those points are hugging the line very close to it. Whereas when we get to the one on the right, that’s a much weaker correlation. Yes, there’s a general pattern, but there’s a lot of variability now.
And the more variability there is, the harder it is to discern a particular pattern. Here’s some negative correlations, same thing, near perfect on the left, very strong in the middle, hugging the straight line, and then much more variability on the right, a much weaker correlation on the right. Truly uncorrelated data shows no upward or downward trend at all.
Imagine just taking a handful of pennies, throwing them in the air and letting them land on the floor. Just a totally random array, that’s what we’re talking about when we talk about uncorrelated data. The graphs given on the test will not test the subtle boundaries between weak correlations and no correlations.
Either the graph will have a very clear correlation, or it will have none. Sometimes, to demonstrate the trend of a correlation, a graph will include something called a regression line, also known as a trend line or a best fit line, and this shows the general trend in, kind of abstracted into a single line, so you can see sort of the core pattern. A regression line is a model used to make predictions about the relationship of the two variables.
So for example, if we pick this value of x and we want to predict what would we expect for the value of y, we’d expect something that was on the line. Now, with random fluctuations it might be slightly higher or slightly lower than the line, but on average we’d expect it to lie on the line. Points above the line have a y-variable higher than predicted. Points below the line have a y-variable lower than predicted.
So here’s a practice question. Pause the video, and then we’ll talk about this. Okay, the trend line indicates for a given body mass index what the expected basal metabolic rate would be for the individual. So, given a, say a body mass index of 30, we’d expect a basal metabolic rate right there, a little above 1,600.
In this group, how many individuals have a basal metabolic rate higher than predicted? So really, this is just asking us to count the points above the line. Now, this is the funny thing about data interpretation questions. Often, when a graph, when you actually interpret what a graphical question is asking you to do, it’s so something so simple that a grade school student could do.
When we phrase it like that, count the points above the line, a third grader could answer this question. Now, one, two, three, four, five, six, seven, that’s how many points are above the line, seven. So, I point this out merely to point out, don’t be afraid of the easy answer. Graphs are supposed to make things easy.
And so, once you interpret what a graph is asking it may be something incredibly easy that you actually have to do to get the answer. The whole trick is interpreting it so that you figure out what it is that you have to do. But don’t be afraid of the easy answer, graphs are supposed to make things easy. In summary, scatterplots visually display the relationships of two variables over a group of individuals.
The exact location of the dot indicates the value of the two different variables for that individual. Correlations, positive or negative, are patterns that pertain to a trend of the set as a whole. They’re pertaining to the whole set, the whole population, not a point-by-point comparison.
In regression lines or trend lines, demonstrate the overall correlation pattern. They are predictive in nature, providing a model whereby a y-variable could be estimated from a new x-variable
مشارکت کنندگان در این صفحه
تا کنون فردی در بازسازی این صفحه مشارکت نداشته است.
🖊 شما نیز میتوانید برای مشارکت در ترجمهی این صفحه یا اصلاح متن انگلیسی، به این لینک مراجعه بفرمایید.