What's new

Sampling distribution of OLS estimators

Thread starter #1

I understand that the assumption that the sampling distribution of OLS estimators b0 and b1 is asymptotically normal is a key property. However I'm a bit stuck as to why that is. I assume the magic CLT comes into play here, but I guess there are stil grey areas for me.
When we apply the CLT, we apply it not to the distribution of the sample, but the distribution of the sample mean as a random variable.
When we talk about i.i.d. samples of X and Y here, and the corresponding SRF, and b0/b1 estimators, we have a sampling distribution. But how does the sampling distribution of their sample mean/CLT becomes relevant. I guess what I am trying to express here is that we are interested in the sampling distribution, not the sampling distribution of the sample mean?

What am I missing? Hope my question makes sense, thanks!


David Harper CFA FRM

David Harper CFA FRM
Staff member
Hi Florence,

Yes that's correct. This is a topic that was better explained in Gujarati, who was assigned to econometrics before Stock and Watson. There is an indirect sense in which the b(0) and b(1) estimates are a sophisticated sort of sample mean, but they are sample statistics; e.g., each sample produces a different sample regression function with different intercept/slope estimates. You are right, of course, that a single scatterplot (of X,Y pairs) is a single sample. But then there are a set of assumptions that inform the OLS linear regression (technically the Gauss-Markov theorem https://en.wikipedia.org/wiki/Gauss–Markov_theorem tells us the implications of these assumptions). Arguably, the keys assumptions in the CLRM concern the error term: (1) it is presumed constant (i.e., homoskedastic; aka, "identical" over time), (2) it is uncorrelated to the independent/explanatory variable, and (3) it is without correlation to itself (i.e., no autocorrelation in the regression; aka, "independent").

CLT tells us that the average or summation (after all, the sum is merely the average * n) of i.i.d. random variables tends to get normal as the sample increases. These assumptions about the error term allow CLT to be applied to the error term; it is approximately normal with mean zero, by construction of the OLS. The regression coefficients are then, actually, linear functions of the error term, so they inherit the normality of the error term. At this point, there is a sense in which they are similar to sample means of a sophisticated sort. The single sample is the scatterplot of pairwise (X,Y) values. We can retrieve the sample mean of X and the sample mean of Y; the regression line will run through these points, by construction. The CLT might tell us about the property of these sample means of X and Y, but this is not the regression. Additionally, CLT informs the error of the regression and indirectly the regression coefficients (which are estimates). They each have their own standard errors (i.e., standard deviations applicable to them as sample statistics). If the regression produces a slope estimate of 0.30 with a standard error of 0.15, then by virtue of the OLS assumptions, CLT is governing our ability to observe that, for a large sample, this slope is (0.30 - 0)/0.15 = 2.0 standard deviations away from zero; aka, different than zero with exact significance (p value) of about 95% or right on the decision "bubble." So the math is non-trivial but the essence of your logic is correct. I hope that's helpful!