Hi
@tbc1984 Just to briefly add to Nicole's links, that copied table of course refers to a
sample mean. You will notice that the t-stat is used whenever the variance is unknown. That's the realistic use case: we typically observe samples, we don't typically know the population's variance. The realistic fact that we typically (outside of a textbook) substituting the observed sample variance, S^2, for the unknown population variance, σ^2, is the reason the t-stat is used (we lose a degree of freedom). In this way, the student's t is approximating the normal distribution, and the normal distribution is theoretically really justifying everything in the chart:
- When sampling from a normal (top two rows), we probably intuitively would expect the sample mean to also be normal
- But central limit theorem (CLT) provides the more interesting (astonishing) insight of the bottom two rows: if the observations are i.i.d. (the key assumption), even if the distribution is non-normal, as the sample gets larger, the sample mean distribution is approximately normal. So notice there is really only one pair of conditions for large samples (is the only reason I could get a little confused by the table): the (i.i.d) sample mean for large samples, regardless of population, is approximately normal for all distributions. And the z-statistic is justified. But if we lose the degree of freedom (ie, don't know true variance), we should use the student's t. At the same time, per the footnote it's okay to use the normal for large samples (because at high df, the normal is approximating the student's anyhow ... yes i do mean to say that the normal is approximating the student's t, if we don't know the true variance). It's still true that, if you don't have the population variance, the student's t is the correct choice. I hope that helps, thanks!
Stay connected