Linear Regression (DP IB Maths: AI HL)

Revision Note

Dan

Author

Dan

Expertise

Maths

Did this video help you?

Linear Regression

What is linear regression?

  • If strong linear correlation exists on a scatter diagram then the data can be modelled by a linear model
    • Drawing lines of best fit by eye is not the best method as it can be difficult to judge the best position for the line
  • The least squares regression line is the line of best fit that minimises the sum of the squares of the gap between the line and each data value
    • This is usually called the regression line of y on x
    • It can be calculated by looking at the vertical distances between the line and the data values
  • The regression line of y on x is written in the form space y equals a x plus b
  • a is the gradient of the line
    • It represents the change in y for each individual unit change in x
      • If is positive this means increases by for a unit increase in x
      • If is negative this means decreases by |a| for a unit increase in x
  • b is the y – intercept
    • It shows the value of y when x is zero
  • You are expected to use your GDC to find the equation of the regression line
    • Enter the bivariate data and choose the model “ax + b”
    • Remember the mean point left parenthesis x with bar on top comma space y with bar on top right parenthesis will lie on the regression line

How do I use a regression line?

  • The equation of the regression line can be used to decide what type of correlation there is if there is no scatter diagram
    • If a is positive then the data set has positive correlation
    • If a is negative then the data set has negative correlation
  • The equation of the regression line can also be used to predict the value of a dependent variable (y) from an independent variable (x)
    • The equation should only be used to make predictions for y
      • Using a y on x line to predict x is not always reliable
    • Making a prediction within the range of the given data is called interpolation
      • This is usually reliable
      • The stronger the correlation the more reliable the prediction
    • Making a prediction outside of the range of the given data is called extrapolation
      • This is much less reliable
    • The prediction will be more reliable if the number of data values in the original sample set is bigger

Exam Tip

  • Once you calculate the values of and store then in your GDC
    • This means you can use the full display values rather than the rounded values when using the linear regression equation to predict values
    • This avoids rounding errors

Worked example

Barry is a music teacher. For 7 students, he records the time they spend practising per week (x hours) and their score in a test (y %).

Time (x)

2

5

6

7

10

11

12

Score (y)

11

49

55

75

63

68

82

a)
Write down the equation of the regression line of y on x, giving your answer in the form y equals a x plus b where a and b are constants to be found.

4-2-3-ib-ai-sl-linear-regression-a-we-solution

b)
Give an interpretation of the value of a.

4-2-3-ib-ai-sl-linear-regression-b-we-solution

c)
Another of Barry’s students practises for 15 hours a week, estimate their score. Comment on the validity of this prediction.

4-2-3-ib-ai-sl-linear-regression-c-we-solution

Did this page help you?

Dan

Author: Dan

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.