Goodness of Fit Test (DP IB Maths: AI SL)

Revision Note

Dan

Author

Dan

Expertise

Maths

Did this video help you?

Chi-Squared GOF: Uniform

What is a chi-squared goodness of fit test for a given distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample which suggests that the population has a given distribution
  • This could be that: 
    • the proportions of the population for different categories follows a given ratio 
    • the population follows a uniform distribution
      • This means all outcomes are equally likely

What are the steps for a chi-squared goodness of fit test for a given distribution?

  • STEP 1: Write the hypotheses
    • H0 : Variable X can be modelled by the given distribution
    • H1 : Variable X cannot be modelled by the given distribution
      • Make sure you clearly write what the variable is and don’t just call it X
  • STEP 2: Calculate the expected frequencies
    • Split the total frequency using the given ratio
    • For a uniform distribution: divide the total frequency N by the number of possible outcomes k
  • STEP 3: Calculate the degrees of freedom for the test
    • For k possible outcomes
    • Degrees of freedom is nu equals k minus 1
  • STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 6: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the given distribution
      • Therefore this suggests that the data is not distributed as claimed
    •  If you accept H0
      • There is insufficient evidence to suggest that variable X does not follow the given distribution
      • Therefore this suggests that the data is distributed as claimed

Worked example

A car salesman is interested in how his sales are distributed and records his sales results over a period of six weeks. The data is shown in the table.

Week

1

2

3

4

5

6

Number of sales

15

17

11

21

14

12

chi squared goodness of fit test is to be performed on the data at the 5% significance level to find out whether the data fits a uniform distribution.

a)
Find the expected frequency of sales for each week if the data were uniformly distributed.

4-7-3-ib-ai-sl-gof-uniform-a-we-solution

b)
Write down the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-uniform-b-we-solution

c)
Write down the number of degrees of freedom for this test.

4-7-3-ib-ai-sl-gof-uniform-c-we-solution

d)
Calculate the p-value.

4-7-3-ib-ai-sl-gof-uniform-d-we-solution

e)
State the conclusion of the test. Give a reason for your answer.

4-7-3-ib-ai-sl-gof-uniform-e-we-solution

Did this video help you?

Chi-Squared GOF: Binomial

What is a chi-squared goodness of fit test for a binomial distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample suggesting that the population has a binomial distribution
    • You will be given the value of p for the binomial distribution

What are the steps for a chi-squared goodness of fit test for a binomial distribution?

  • STEP 1: Write the hypotheses
    • H0 : Variable X can be modelled by the binomial distribution straight B left parenthesis n comma space p right parenthesis
    • H1 : Variable X cannot be modelled by the binomial distribution straight B left parenthesis n comma space p right parenthesis
      • Make sure you clearly write what the variable is and don’t just call it X
      • State the values of n and p clearly
  • STEP 2: Calculate the expected frequencies
    • Find the probability of the outcome using the binomial distribution straight P left parenthesis X equals x right parenthesis
    • Multiply the probability by the total frequency straight P left parenthesis X equals x right parenthesis cross times N
  • STEP 3: Calculate the degrees of freedom for the test
    • For k outcomes
    • Degrees of freedom is nu equals k minus 1
  • STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  • STEP 6: Write your conclusion
    • If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the binomial distribution straight B left parenthesis n comma space p right parenthesis
      • Therefore this suggests that the data does not follow straight B left parenthesis n comma space p right parenthesis
    • If you accept H0
      • There is insufficient evidence to suggest that variable X does not follow the binomial distribution straight B left parenthesis n comma space p right parenthesis
      • Therefore this suggests that the data follows straight B left parenthesis n comma space p right parenthesis

Worked example

A stage in a video game has three boss battles. 1000 people try this stage of the video game and the number of bosses defeated by each player is recorded.

Number of bosses defeated

0

1

2

3

Frequency

490

384

111

15

chi squared goodness of fit test at the 5% significance level is used to decide whether the number of bosses defeated can be modelled by a binomial distribution with a 20% probability of success.

a)
State the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-binomial-a-we-solution

b)
Assuming the binomial distribution holds, find the expected number of people that would defeat exactly one boss.

t9ph9q9z_4-7-3-ib-ai-sl-gof-binomial-b-we-solution

c)
Calculate the p-value for the test.

3sGACCT3_4-7-3-ib-ai-sl-gof-binomial-c-we-solution

d)
State the conclusion of the test. Give a reason for your answer.opxxE5_K_4-7-3-ib-ai-sl-gof-binomial-d-we-solution

Did this video help you?

Chi-Squared GOF: Normal

What is a chi-squared goodness of fit test for a normal distribution?

  • A chi-squared (chi squared) goodness of fit test is used to test data from a sample suggesting that the population has a normal distribution
    • You will be given the value of μ and σ for the normal distribution

What are the steps for a chi-squared goodness of fit test for a normal distribution?

·     STEP 1: Write the hypotheses

    • H0 : Variable X can be modelled by the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
    • H1 : Variable X cannot be modelled by the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
      •  Make sure you clearly write what the variable is and don’t just call it X
      • State the values of μ and σ clearly

  • STEP 2: Calculate the expected frequencies
    • Find the probability of the outcome using the normal distribution straight P left parenthesis a less than X less than b right parenthesis
      • Beware of unbounded inequalities straight P left parenthesis X less than b right parenthesis or straight P left parenthesis X greater than a right parenthesis for the class intervals on the 'ends'
    • Multiply the probability by the total frequency straight P left parenthesis a less than X less than b right parenthesis cross times N
  • STEP 3: Calculate the degrees of freedom for the test
    •  For k class intervals
    • Degrees of freedom is nu equals k minus 1
  •  STEP 4: Enter the frequencies and the degrees of freedom into your GDC
    • Enter the observed and expected frequencies as two separate lists
    • Your GDC will then give you the χ² statistic and its p-value
    • The χ² statistic is denoted as chi subscript c a l c end subscript superscript 2
  • STEP 5: Decide whether there is evidence to reject the null hypothesis
    • EITHER compare the χ² statistic with the given critical value
      • If χ² statistic > critical value then reject H0
      • If χ² statistic < critical value then accept H0
    • OR compare the p-value with the given significance level
      • If p-value < significance level then reject H0
      • If p-value > significance level then accept H0
  •  STEP 6: Write your conclusion
    •  If you reject H0
      • There is sufficient evidence to suggest that variable X does not follow the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
      • Therefore this suggests that the data does not follow straight N left parenthesis mu comma space sigma squared right parenthesis
    • If you accept H0
      •  There is insufficient evidence to suggest that variable X does not follow the normal distribution straight N left parenthesis mu comma space sigma squared right parenthesis
      •  Therefore this suggests that the data follows straight N left parenthesis mu comma space sigma squared right parenthesis

Worked example

300 marbled ducks in Quacktown are weighed and the results are shown in the table below.

Mass (g)

Frequency

m less than 470

10

470 less or equal than m less than 520

158

520 less or equal than m less than 570

123

m greater or equal than 570

9

chi squared goodness of fit test at the 10% significance level is used to decide whether the mass of a marbled duck can be modelled by a normal distribution with mean 520 g and standard deviation 30 g.

a)
Calculate the expected frequencies, giving your answers correct to 2 decimal places.

4-7-3-ib-ai-sl-gof-normal-a-we-solution

b)
Write down the null and alternative hypotheses.

4-7-3-ib-ai-sl-gof-normal-b-we-solution

c)
Calculate the chi squared statistic.

4-7-3-ib-ai-sl-gof-normal-c-we-solution

d)
Given that the critical value is 6.251, state the conclusion of the test. Give a reason for your answer.

4-7-3-ib-ai-sl-gof-normal-d-we-solution

Did this page help you?

Dan

Author: Dan

Dan graduated from the University of Oxford with a First class degree in mathematics. As well as teaching maths for over 8 years, Dan has marked a range of exams for Edexcel, tutored students and taught A Level Accounting. Dan has a keen interest in statistics and probability and their real-life applications.