What's new

Z-score, t-statistics

Thread starter #1

I would like to ask what is the correct form of Z and t-statistics. If I am not mistaken it is


where mu = population mean, observed value or beta in case of regression and X(h) is the tested value or null hypothesis.

Are there any cases when the numerator should be X(h) - mu instead? I found several in the question sets (P1.T2.317.3, P1.T2.319.1, P1.T2.403.2, P1.T2.405.1, P1.T2.405.2)

I guess I am missing some piece to understand...Thank you a lot!

David Harper CFA FRM

David Harper CFA FRM
Staff member
Hi @Tereza

It should be Z = (X - µ)/SE(X) or t = (X - µ)/SE(X), where X is the observed value and µ is the population mean (or, hypothesized population mean). Our questions will follow this format. Notice how the population parameter is denoted by a Greek µ but the sample mean is denoted by a Roman X. The "tested value" is a bit ambiguous. We are subtracting the "hypothesized population mean" (aka, µ) from the "observed sample mean." In many default use cases, the hypothesized population mean is zero, such that t = (X - µ)/SE(X) = (X - 0)/SE(X) reduces to t = X/SE(X) which can be called the "test statistic" or "computed t value" (and can be compared to the "critical value" or "lookup value" which is based on the standardized table). We just recorded an updated Stock and Watson video (Chapter 5); see below my snapshot. That's a (sample) regression function where TestScore = 698.9 - 2.280*STR; STR is independent variable, the student-teacher ratio. As the SE is 0.48, the test statistic (aka, t stat) is given by (-2.280 - 0)/0.48 = -4.75 or its absolute value 4.75 (+/- in this contest is not consequential because the student's t is symmetrical and the interpretation is "our observed sample mean is 4.75 standard standard deviations away from the hypothesized mean"). Stock and Watson actually use even different terminology:

t = (estimator - hypothesized value)/(standard error of the estimator)

... which is a bit more technical by referring to the (observed) sample mean as an "estimator;" this is, of course, correct and reminds us that the sample mean is not the only estimator we can possibly generate. I hope that's helpful!

Thread starter #3
Hello David,

Thank you a lot!

Just to confirm my understanding (if I follow the format from above):

1) The fund has an expected return of 8%. What is the probability that the return will be higher than 12%?
µ = 8% because this is the whole population mean, X = 12% because I can understand the question also as: What is the prob. that a return of some sample of the whole population will be higher than 12%.
So basically every time when I have an example of type: Expected value/mean is xxx. What is the probability that this value will be yyy. The xxx is µ (because it is a population parameter and yyy is X (because it is a sample partameter).

2) Regarding the regression coef. significance testing
b = X, because b represent only one of possible estimates as it is an estimate for one of the sample. I am testing whether the real b = beta = b for the whole population (-> = µ) is equal to 0.

Just one more point connected to 1): The fact whether I am testing => or <= does not anyhow influence t or Z calculation. It is only taken into account in following step:
What is the prob. that a value is lower than z -> Prob (Z<z) -> use the look-up table directly;
What is the prob.that a value is higher than z -> prob(Z>z) = 1 - Prob(Z<z) -> 1 - look-up value.

Thank you againg,

David Harper CFA FRM

David Harper CFA FRM
Staff member
Hi @Tereza Sure thing! Yes, basically all correct although I might quibble with some of the wording, sorry occupational hazard :rolleyes: ...
  1. Per your example, say we take a sample size of 25 and observe a sample mean return of +12.0% with (sample) standard deviation of 5.0%. Our null hypothesis might be that the (population's) expected return is "truly" an 8.0% return. If we ask the two-sided question, our null is H0: µ = 8.0%, our alternative is HA: µ ≠ 8.0%, and we are asking "What is the probability, conditional on a true null, that we might randomly observe a sampled return that is at least +/- 4% away from the mean; ie, 12% - 8%. If we ask the one-sided question, our null would be H0: µ ≤ 8.0% such that our alternative is HA: µ > 8.0% and we are asking "What is the probability that we might randomly observe a sample return that is at least +4% greater than the mean?" Notice how in your example we would not designate the one-sided null as H0: µ ≥ 8.0% because we observed 12.0% so we don't need to bother testing reject/accept!
  2. To your third question, continuing with the example, our test statistic for the sample mean is (12% - 8%)/[5.0%/sqrt(25)] = 4.0. This isn't affected by Z/t choice or the one/two sided choice. If our confidence is 95.0%, then the one-sided critical t is =T.INV(95%,24) = 1.710 and the two-sided critical = T.INV.2T(5%, 24) = 2.064 (you can see these on the lookup table) and, in either case, we are rejecting the null because "+4.0% is too far away from the hypothesized mean to be due to random sampling." We could retrieve the two-sided p value with T.DIST(4.0, 24 df) = 0.0527% and this is the area in the two-sided rejection region so we could say "We could reject this null with exactly 99.9473% = 1 - pvalue .... such that actually, we would accept this null at 99.95% confidence!"
  3. To your second question, let's say the hypothesized population regression function (PRF) is given by Y = a + βX, and our sample regression function returns Y = a + bX. Then (b) is an estimator and we will test with (b - β)/SE, similar to above (because these regression coefficients are actually conditional sample means also). I hope that's helpful!