What's new

P1.T2.20.14. Hypothesis testing

Nicole Seaman

Director of FRM Operations
Staff member
Subscriber
Learning objectives: Construct an appropriate null hypothesis and alternative hypothesis and distinguish between the two. Differentiate between a one-sided and a two-sided test and identify when to use each test. Explain the difference between Type I and Type II errors and how these relate to the size and power of a test. Understand how a hypothesis test and a confidence interval are related.

Questions:

20.14.1. Peter Parker at Betalab Bank emailed a survey to the bank's customers. The survey included a question that asked them to rank their customer satisfaction on a scale from one to 10. He received 51 responses, and he considers that a random sample (n = 51). Among this sample, the average satisfaction score (on a scale of one to 10) is 8.50 with a sample standard deviation of 1.90. Betalab's CEO is Mary-Jane, FRM, and she hopes that the bank's average customer satisfaction is at least 9.0. Mary-Jane holds the FRM designation so she understands that acceptance of the null is more accurately a failure to reject the null, but she is a practical person. Her null hypothesis is that the population's average customer satisfaction is at least 9.0 (i.e., H0: μ ≥ 9.0 and H1: μ < 9.0). Peter shares his sample findings with five of his colleagues, and each colleague gives different input, as follows:

I. Albert says the test statistic is (8.5 - 9.0) ÷ [1.90 / SQRT(51)] = -1.88, or |-1.88| = 1.88​
II. Betty says that if the sample size were doubled, ceteris paribus (i.e., same sample mean and sample standard deviation), the test statistic will increase about +41%​
III. Chris says that (for n = 51) Mary should use a one-sided test, and with one-sided 95.0% confidence (aka, 5.0% significance) she should reject the null​
IV. Derek says that (for n = 51) Mary can accept (aka, fail to reject) the CEO's null hypothesis with 95.0% confidence but only if she artificially switches to a two-sided hypothesis (i.e., H0: μ = 9.0 and H1: μ ≠ 9.0)​
V. Erin agrees with Chris and says that Mary should use a one-sided test per the CEO's one-sided hypothesis but notes that Peter can accept (aka, fail to reject) the null with one-sided 99.0% confidence (aka, 1.0% significance)​

Which of the five statements is (are) correct?

a. None of the statements are correct
b. Only I. and II. are correct
c. Only IV and V. are correct
d. All five of the statements are correct

20.14.2. The Fulcrum Jetpack is a high-risk, high-reward leveraged exchange-traded note (ETN). It claims a (population) mean monthly excess return of at least 200 basis points. Over the last five years (i.e., sample size, n = 60 months), the sample excess mean return was 230 basis points with a standard deviation of 120 basis points. Each of the following is true EXCEPT which is false?

a. If the p-value is 0.02880 then the power of the test is 97.120%
b. In the case of the one-sided null hypothesis test, we should reject at 95.0% confidence but we can accept (aka, fail to reject) at 99.0% confidence
c. If we increase confidence from 95.0% to 99.0% (aka, decrease the significance from 5.0% to 1.0%), ceteris paribus, the power will decrease
d. We can increase the power either by decreasing the confidence level (aka, increasing the significance level) and/or, ceteris paribus, increasing the sample size

20.14.3. Janice has been asked to backtest her firm's 95.0% one-day value at risk (VaR) model. If she assumes exceptions (i.e., days when the loss exceeds the VaR level) are independent, then the binomial distribution describes the number of exceptions. The historical sample is 1,000 days based on 250 days per year. The firm's 95.0% confident one-day VaR is $38,000. If the VaR model is accurate, she expects to observe 5.0% * 1,000 days = 50 exceptions, but that is just the average of a binomial distribution. She also knows that if n*p and n*(1-p) are greater than 10, she can approximate the binomial with the normal (as a rule of thumb); indeed, 5%*1,000 > 10. Using the normal approximation, her 95.0% confidence interval is given by 1,000*5.0% +/- [1.96 × SQRT(5% × 95% × 1,000)], or (36.5, 63.5). In addition, each of the following statements is true EXCEPT which is false? a. If she increases the confidence level of the hypothesis test (on the same 95.0% VaR model) from 95.0% to 99.0%, the power of her test will decrease b. If the standard deviation of the VaR is approximately$4,000, then the confidence interval around her VaR is $38,000 +/- 1.96 ×$4,000/SQRT(1,000) or ($37,752;$38,248).
c. She can increase the power of her hypothesis test by decreasing its confidence level (e.g., from 95% to 90.0%) but this will increase the probability of a Type I error
d. If she increases the confidence level of the hypothesis test (on the same 95.0% VaR model) from 95.0% to 99.0%, the 99.0% confident interval is given by (32.2, 67.8) exceptions

Last edited by a moderator:

etzaros

New Member
Subscriber
Hello David, I am little bit confused regarding Null Hypothesis initial setup. Generally I have understood that the belief is part of the Alternative. However in case of P1.T2.20.14 the belief is that μ is at least 9 (μ>=9) . In my head, this ios translated to Ha: μ>9. But then the exercise states that Ha: μ<9. Where is my mistake?
Mary-Jane, FRM, and she hopes that the bank's average customer satisfaction is at least 9.0. Mary-Jane holds the FRM designation so she understands that acceptance of the null is more accurately a failure to reject the null, but she is a practical person. Her null hypothesis is that the population's average customer satisfaction is at least 9.0 (i.e., H0: μ ≥ 9.0 and H1: μ < 9.0).

Regards, Stathis

lushukai

Active Member
Subscriber
Hi @etzaros ,

When you mentioned "But then the exercise states that Ha: μ<9. Where is my mistake?", are you referring to the statement in the question "Among this sample, the average satisfaction score (on a scale of one to 10) is 8.50 with a sample standard deviation of 1.90"? If that is the case, the average is based on a sample (51 customers), which does not represent the average of the population (all of bank's customers) and there might be a possibility that the actual average be above 9.0 (if we survey the whole population). However (due to reasons of practicality), we can't do that, which is why we employed a statistical test.

Do let me know if I understood your question correctly!

etzaros

New Member
Subscriber
Hi @etzaros ,

When you mentioned "But then the exercise states that Ha: μ<9. Where is my mistake?", are you referring to the statement in the question "Among this sample, the average satisfaction score (on a scale of one to 10) is 8.50 with a sample standard deviation of 1.90"? If that is the case, the average is based on a sample (51 customers), which does not represent the average of the population (all of bank's customers) and there might be a possibility that the actual average be above 9.0 (if we survey the whole population). However (due to reasons of practicality), we can't do that, which is why we employed a statistical test.

Do let me know if I understood your question correctly!
Hello, @lushukai,

At first thanks for the instant reply. Regarding the question, my point is that as a rule of thumb I have realized that the belief is the alternative hypothesis (in this case μ>=9). in the progress of the exercise is stated that Ho>=9. And this is the tricky point.
My greatest doubt is whether I have not understood the rule.

Regards,

S.

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Hi @etzaros (thank you @lushukai ) I do understand that we may anchor the affirmative belief with the alternative hypothesis, but it isn't necessarily the case and I'm not sure that practice itself informs the directionality of one-sided alternative. In question 20.14.1., we have an (observed) sample mean of 8.50 and the CEO's belief that the true (population) mean is at least 9.0. This mental challenge never arises in the more common two-sided test because, if the test were two-sided, the null/alternative hypothesis would be H0: μ = 9.0 and H1: μ 9.0. Notice this does not itself imply that the "belief" is the that true mean is unequal to 9.0, but instead can validly reflect a belief that the true mean is equal to 9.0. The null can reflect the affirmative belief. The notion to which you speak of course is common but it arises because the most common test is of a (eg.., regression) coefficient where the researcher seeks to reject the null hypothesis that some parameter is equal to zero; i.e., their belief is that the parameter is significantly different than zero such that they embrace the alternative. But that is not necessarily the test! The most common null is zero because we want to affirm a non-zero parameter. But we can observe a non-zero sample mean and believe that other non-zero values are true.

My view is that it is even more fundamentally a function of where the one-sided rejection region must be located: we only have one choice! In question 20.14.1 we have the CEO Mary Jane's preference for a one-sided test given the null of 9.0 (i.e., the null must include the equal sign) and the observation of 8.50. Those are the three assumptions: an observed sample mean of 8.50; a null of 9.0 (which must have the "="); and the preference for a one-sided test. Consider the problem of framing the hypothesis as follows:

H0: μ ≤ 9.0 and H1: μ > 9.0

The problem with this is than any observation less than 9.0 is already in the acceptance region! No observation less than 9.0 can be rejected. If the observed sample average (instead) were 2.0, then despite its implausible distance, it locates within the acceptance region, such that we fail to reject, which makes tests of such a null (to my thinking) absurd. Rather, to my thinking the only one-sided test here, where the null (i.e., 9.0) is greater than the observed sample mean (i.e., 8.5) is the following:

H0: μ ≥ 9.0 and H1: μ < 9.0

... because now we can test whether the observation falls, or does not, into the rejection (aka, critical) region. The critical region must lie to the left. In this way, rather than say "the alternative is what is to be affirmed", I think the robust idea of a one-sided test is "the alternative must lie in the direction of the observation because that's where the rejection region is and we are looking to determine if it crosses the line, so to speak, into statistical unlikelihood given our confidecne" Yes, it's true that in this case, the null matches Mary-Jane's hypothesis that the true score is greater than or equal to 9.0, but I see no problem with that. Sorry for length but I did want to think it through carefully. Let me know if you think the other test is even sensical, I just don't see any other approach (also, the whole question would seem nonsensical if it read "Mary-Jane believes the true score is less than or equal to 9.0" because our observation is less than 9.0. It seems to me if the null is greater than the observation, we've got to locate the rejection/critical region to the right, and then phrase accordingly). Thanks,

Last edited:

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
I was musing about this in our slack. Let me abstract to a relatable example. Let's say our null hypothesis is that market beta β = 1.0. Our issue concerns the one-sided test. My view is:
• If we observe a sample mean of 1.30, the one-sided null must be H0: μ ≤ 1.0 because the (only per one-sided) rejection region needs to be on the right. In this scenario, if instead or our null is H0: μ ≥ 1.0, how are we supposed to entertain the possibility of rejecting? Any sample mean above 1.0 falls into the one-sided acceptance region. To me it makes no sense to insist, in this case, that the null is the straw man: an observation of 1.30 against a null of H0: μ ≥ 1.0 is supporting evidence! And so is 2.60 or 9.99 for that matter!
• if we observe a sample mean of 0.70, the one-sided null must be H0: μ ≥ 1.0 for the same reason. I hope that's interesting!