Summary statistics for each exam version are shown in Table 3.7. The teacher would like to evaluate whether this difference is so large that it provides convincing evidence that Version B was more difficult (on average) than Version A.
| Version | min | max | |||
|---|---|---|---|---|---|
| A | 30 | 79.4 | 14 | 45 | 100 | 
| B | 27 | 74.1 | 20 | 32 | 100 | 
Construct a two-sided hypothesis test to evaluate whether the observed difference in sample means, , might be due to chance.
Answer. Because the teacher did not expect one exam to be more difficult prior to examining the test results, she should use a two-sided hypothesis test. : the exams are equally difficult, on average. . : one exam was more difficult than the other, on average. .
To evaluate the hypotheses in Exercise 3.4.1 using the distribution, we must first verify assumptions. (a) Does it seem reasonable that the scores are independent within each group? (b) What about the normality condition for each group? (c) Do you think scores from the two groups would be independent of each other (i.e. the two samples are independent)?
Answer. (a) It is probably reasonable to conclude the scores are independent. (b) The summary statistics suggest the data are roughly symmetric about the mean, and it doesn’t seem unreasonable to suggest the data might be normal. Note that since these samples are each nearing 30, moderate skew in the data would be acceptable. (c) It seems reasonable to suppose that the samples are independent since the exams were handed out randomly. After verifying the conditions for each sample and confirming the samples are independent of each other, we are ready to conduct the test using the distribution. In this case, we are estimating the true difference in average test scores using the sample data, so the point estimate is . The standard error of the estimate can be calculated using Equation (3.3):
Finally, we construct the test statistic:
If we have a computer handy, we can identify the degrees of freedom as 45.97. Otherwise we use the smaller of and : .
Identify the p-value, shown in Figure LABEL:pValueOfTwoTailAreaOfExamVersionsWhereDFIs26. Use .
Answer. We use . .
2*(1-pt(1.15,df=26)) = 0.2606121. Because the
p-value is larger than 0.05, we do not reject the null hypothesis. That is, the data do not
convincingly show that one exam version is more difficult than the other, and the teacher should not
be convinced that she should add points to the Version B exam scores.
In Exercise 3.4.3, we could have used
: 2*(1-pt(1.15,df=45.97)) = 0.256.