What's new

Question about Calculation of Sample Variance (from 08' Quant Notes pg. 36)

hsuwang

Member
Hello David,

I don't know if I'm allowed to quote from your study notes, I'm really sorry if I am not.

"Let’s take the following series: 10, 12, 14, and 16. The average of the series is (10+12+14+16) / 4 = 13. So, for the population variance, in the numerator we want to sum the squared differences. The population variance is given by [(10-13)^2 + (12-13)^2 + (14-13)^2 + (16-13)^2] / 4 = 20 / 4 = 5. The sample variance has the same numerator and (4-1) for the denominator: 20 / 3 = 6.7."

My question is how do you solve for the sample variance using the Var(x)=[E(X^2)-E(X)^2] formula.
I tried calculating for the population variance (as shown below) and it worked, but how do you do it for the sample variance? Where would the (n-1) part come into the equation? Thank you very much!

Var(pop)=[(10^2+12^2+14^2+16^2)/4]-13^2 = 174-169 = 5
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Hi Jack,

Sure, quote away. Mathematically, we can reverse engineer: knowing the ratio of sample to pop variance must be n/(n-1)(etc...), we can use Sample (variance) = [E(X^2)-E(X)^2] * n/(n-1).

However, and our notes should do a better job on this (actually, I think it's maybe a weakness in Gujarati as he eschews the overly technical, too), I don't think you will ever see this form. Why? It goes to one of the key aspects of Gujurati and inferential statistics, the difference between the "one true" population and the "many" sample statistics. If we take the 6-sided die as a random variable, it has (population) variance of [E(X^2)-E(X)^2]. End of story for the random variable, because in this case we are lucky to know the true characteristics of the random variable. It makes no sense to adjust the true variance.

Approached from the other end, and this is our more typical situation, we don't actually know the parameters of the random variable (e.g., we don't know the equity risk premium), so we are trying to infer it from a sample. So, in the case of the 6-sided die, we maybe roll it 20 times. And the best we can do try and learn about the unknowable variable from our observable sample. The recipe that we apply to the sample is called the estimator and it gives us an estimate. For an "unbiased estimate" we use weighted sum of squared divided by (n-1). Why? Because it has desirable properties. But we can use other recipe estimators; e.g., it is not *wrong* to divide by (n) which is the ex post equivalent of [E(X^2)-E(X)^2]. That is a different estimator. I could make up my own estimator, Dave's New Sample Variance Estimator = multiply everything by 9, only it will turn out to be very lame predictor of the true variance!

This partly explains why the confusion when we see different sample skew/kurtosis formulas: because in fact there can be different *estimates* produced by *estimators*.

But this formula, Var(x)=[E(X^2)-E(X)^2], is correct for the population variance and, unlike sample estimators, does not really call for alternative formulations. I hope that helps.

David
 

hsuwang

Member
Hello David,

Just to clarify, if we are calculating the sample variance of a sample that has a unknown population, it would be better to use the typical formula stated on pg.36 [sig (Xi-Mu)^2]/(n-1) rather than E(X^2)-E(X)^2 (and then adjusted for n/(n-1))formula right?

I guess there are many ways you can go to get to the same result, but what confuses me is when to use what.

Thanks!
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Jack,

Yes, the p.36 is the "unbaised estimate" that is consistent with, say, the J Hull formula for estimating variance, so yes, that is the correct formula for us to use.
The other is sort of funny notation against a sample, but if we disregard that, it will give the same result. So, in terms of this "unbaised (sample) estimate" of variance, there is no conflict, the p. 36 is the technically correct (and only, for our purposes) sample variance.

You'll see later, in Hull, that he will "simplify" by replacing the (n-1) by (n) ... that will be a deliberate choice to make the historical volatility easier to calculate...so we just don't want to get too "religious" about it...Thanks, David
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
....appending...just to clarify what I mean above:

The p. 36 sample variance is the correct sample variance, as given; we are always safe to use this "unbiased" sample variance.

But, just keep in mind, it is merely an estimator. There can be others.

In Hull's volatility chapter, he takes this "correct" sample variance and makes two simplifications:

1. He replaces (n-1) with (n). And now we have a easier sample variance (i.e., it is consistent with your other form, Jack!). Is it wrong? No, it is just a perhaps technically slightly inferior variance estimator but a valid estimator nonetheless. In this way, for example, in Excel, it is not incorrect to use =VARP() on a sample rather than =VAR().

2. Then he assumes the mean return = 0. Now he has further simplified the "mere estimator" Whereas the first simplification is quite safe (as he footnotes, he goes from unbaised to MLE estimator), this simplification is more like "close enough for government work." Statistically, this won't have the desirable properties of the original, but it also a valid estimator of the true, unknowable variance. It trades a bit of imprecision for the supreme convenience os being able to say: the variance is the sum of squared returns.

David
 
Top