Did this video help you?
Correlation & Regression (DP IB Maths: AA HL)
Revision Note
Linear Regression
What is linear regression?
- If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model
- Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line
- The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value
- It can be calculated by either looking at:
- vertical distances between the line and the data values
- This is the regression line of y on x
- horizontal distances between the line and the data values
- This is the regression line of x on y
- vertical distances between the line and the data values
How do I find the regression line of y on x?
- The regression line of y on x is written in the form
- a is the gradient of the line
- It represents the change in y for each individual unit change in x
- If a is positive this means y increases by a for a unit increase in x
- If a is negative this means y decreases by |a| for a unit increase in x
- It represents the change in y for each individual unit change in x
- b is the y – intercept
- It shows the value of y when x is zero
- You are expected to use your GDC to find the equation of the regression line
- Enter the bivariate data and choose the model “ax + b”
- Remember the mean point will lie on the regression line
How do I find the regression line of x on y?
- The regression line of x on y is written in the form
- c is the gradient of the line
- It represents the change in x for each individual unit change in y
- If c is positive this means x increases by c for a unit increase in y
- If c is negative this means x decreases by |c| for a unit increase in y
- It represents the change in x for each individual unit change in y
- d is the x – intercept
- It shows the value of x when y is zero
- You are expected to use your GDC to find the equation of the regression line
- It is found the same way as the regression line of y on x but with the two data sets switched around
- Remember the mean point will lie on the regression line
How do I use a regression line?
- The regression line can be used to decide what type of correlation there is if there is no scatter diagram
- If the gradient is positive then the data set has positive correlation
- If the gradient is negative then the data set has negative correlation
- The regression line can also be used to predict the value of a dependent variable from an independent variable
- The equation for the y on x line should only be used to make predictions for y
- Using a y on x line to predict x is not always reliable
- The equation for the x on y line should only be used to make predictions for x
- Using an x on y line to predict y is not always reliable
- Making a prediction within the range of the given data is called interpolation
- This is usually reliable
- The stronger the correlation the more reliable the prediction
- Making a prediction outside of the range of the given data is called extrapolation
- This is much less reliable
- The prediction will be more reliable if the number of data values in the original sample set is bigger
- The equation for the y on x line should only be used to make predictions for y
- The y on x and x on y regression lines intersect at the mean point
Exam Tip
- Once you calculate the values of a and b store then in your GDC
- This means you can use the full display values rather than the rounded values when using the linear regression equation to predict values
- This avoids rounding errors
Worked example
The table below shows the scores of eight students for a maths test and an English test.
Maths () |
7 |
18 |
37 |
52 |
61 |
68 |
75 |
82 |
English () |
5 |
3 |
9 |
12 |
17 |
41 |
49 |
97 |
a)
Write down the value of Pearson’s product-moment correlation coefficient, .
b)
Write down the equation of the regression line of on , giving your answer in the form where and are constants to be found.
c)
Write down the equation of the regression line of on , giving your answer in the form where and are constants to be found.
d)
Use the appropriate regression line to predict the score on the maths test of a student who got a score of 63 on the English test.
Did this video help you?
PMCC
What is Pearson’s product-moment correlation coefficient?
- Pearson’s product-moment correlation coefficient (PMCC) is a way of giving a numerical value to a linear relationship of bivariate data
- The PMCC of a sample is denoted by the letter
- r can take any value such that
- A positive value of r describes positive correlation
- A negative value of r describes negative correlation
- r = 0 means there is no linear correlation
- r = 1 means perfect positive linear correlation
- r = -1 means perfect negative linear correlation
- The closer to 1 or -1 the stronger the correlation
How do I calculate Pearson’s product-moment correlation coefficient (PMCC)?
- You will be expected to use the statistics mode on your GDC to calculate the PMCC
- The formula can be useful to deepen your understanding
-
-
- is linked to the covariance
- and are linked to the variances
- You do not need to learn this as using your GDC will be expected
-
When does the PMCC suggest there is a linear relationship?
- Critical values of r indicate when the PMCC would suggest there is a linear relationship
- In your exam you will be given critical values where appropriate
- Critical values will depend on the size of the sample
- If the absolute value of the PMCC is bigger than the critical value then this suggests a linear model is appropriate
Did this page help you?