What's new

Is the Standard Error of a Regression Coefficient Unbiased

bradnhopkins

New Member
Subscriber
So I am reviewing chapter 7's material on regression, and it has honestly been a long while since I've done this deep of a dive into econometric material, and I couldn't help notice that on the face of it, it looks like the variance/standard error of the regression parameters are biased which makes zero sense to me as the OLS estimators are supposed to be BLUEs (per the Gauss-Markov theorem). In the derivation of 7.12-7.14 any time we use \[ \sigma_x^2 \], we divide the squared deviations of x only by n and not n-2. From chapter 5 we saw that the variance estimator which only divides by n is a biased estimator of the sample variance, which leads to my question. I would assume that if we mix biased (the sample variance of x) and unbiased (the sample variance of the residuals) estimators, the resulting estimator would therefore be biased.... Is there some nuance I'm just not seeing here in the GARP material (or in Miller, or in Undergraduate Econometrics by Hill/Griffith/Judge)?

Thanks...

Edit: cleaned up a typo and wanted to add that of course the standard error should be unbiased, its just that point isn't readily jumping out at me in the math and I'd appreciate any clarification.
 
Last edited:

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Hi @bradnhopkins I haven't carefully analyzed the derivations in GARP's T1.Chapter 7, and you might be correct (certainly this material contains a lot of mistakes), but I will just note: of course the OLS estimators are unbiased (that's the "U" in BLUE, after all!; BLUE is the Best/most efficient Linear Unbiased estimator). Further, we definitely estimate the variance of the error by substituting the variance of the residual (i.e., the sample variance of the residual is an estimator of the unreachable population population variance of the error, is the way I was taught; but we might just say that the key step in the regression is substitution of the sample variance of the error, and the square root of this sample variance is the highly useful standard error of regression, SER).

I am certain, from hand-building regressions (see my learning XLS) that the coefficients are determined by the variance of the error/residual and this variance is given by RSS/df where df = (n - coefficients incl slope). So in a univariate regression wich requires estimating one slope and one intercept coefficient , the error variance SER^2 = RSS/(n-2). In that sense, I agree with you. However, I notice that 7.13 reads "s^2" in the numerator, which if you look at GARP's formula 7.5 a few pages back, is correctly specified with (n-2) divisor. In regard to the variance X, the text does say "large sample estimator of the observations" which appears to be consistent with the rest. To retrieve the variance of (eg) the slope in a univariate regression, my own regression divides the SER^2 by the Σ(Xi - X_avg)^2 or n*variance(MLE, X) so, in my own, it is a biased variance of X, but the key is that the SER^2 has an (n-2) divisor. And I never thought of this as a biased variance, actually, because the formula uses n*Σ(Xi - avg X) so to me it's more like a somewhat natural n*sum_of_squares. So, on quick inspection, I can't find a way to disagree with 7.12 to 7.14 (although I'm not sure why it's presented this way, I find this derivation tediously theoretical; I would prefer numerical illustrations of what's important to us as users of regression). I'll look further this week if/when I get a chance. Thanks for your attention to detail,
 
Last edited:
Top