What's new

Practice Questions 2008 #25

hsuwang

Member
Hello David,
I'm sorry but when I first saw this question, I got totally thrown off, and it really confuses me. (I'm sorry because I know this should be really basic, and I'm not getting it..)

You have a sample of ten data elements (n=10). You accidentally compute a population standard deviation and find it equals two (2). What is the sample standard deviation?

I don't know why but the first thing that comes to mind when I saw this question was to use the Sigma/sqrt(n) formula to get the sample standard deviation. So I basically did 2/sqrt(10), and then of course that was wrong. But can you please tell me what is wrong by doing it this way because it is really messing up my concepts about population and sample variance. Thanks.

ANSWER:
The population variance = 2^2 = 4. This means sum of squared deviations = 4*10 = 40. Divide 40 by (n-1) to get the sample variance, which equals 4.44. So that sample standard deviation = SQRT(4.44) = 2.11. In short, sample variance = population variance * n/(n-1).

and also, I went over the 2008 quant notes and wasn't able to find the above formula.
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Hi Jack,

You won't find the formula anywhere; if i do say so myself, it's a good example of a "sneaky but fair" question that can be asked (see L1.03 - to my thinking, this 3rd sample exam question is a fair question that does not necessarily have an exactly matching formula) b/c it employs other formulas. Classic FRM way...

2/sqrt(10) is the standard error; i.e., the standard deviation of the sample mean (or sampling distribution of the sample mean). So, that is a fine instinct. And it is a standard deviation (which goes by the name "standard error" and the difference is semantic) but it is the standard deviation of the random variable that is the sample mean. If the mean of these 10 rolls were, say, 3.2, that's sample mean #1. Roll 10 again: get 3.5. Sample mean #1. Roll 10 times and you have a different distribution: the distribution of sample means. The standard error is the *expected* standard deviation of that distribution. See how the sample mean has variation, the standard error is the standard deviation of that sample mean.

But 2.11 is (more simply) the observed sample standard deviation of the dataset consisting of 10 rolls.The 10 rolls constitute an empirical distribution (imagine it's histogram) and 2.11 is the s. std dev of that single distribution (as opposed to the standard error of the variation in *means* as you roll several sets of 10 and compute the mean)

David
 

hsuwang

Member
Hello David,

Thanks for the clarification, that really helped! However, as I'm studying over the SER of a regression, that confuses me again. You illustrated how standard error is different from sample standard deviation by using the roll a dice example - standard error is the variations in the sample mean, however, how would you explain the standard error of regression (SER), which is the standard deviation of the Y values around the regression line, but then it seems to me that it has nothing to do with the sample mean variation. I don't know if I'm being clear on my confusion. Thank you!

Jack
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
Hi Jack,

Right, SER is another measure which is different, though more related to sample standard deviation. The key difference is: an OLS regression line has been introduced. So, continuing the example, suppose 10 six-sided die are rolled and we plot the results {1,2,3,4,5,or 6} on the Y axis and the roll# on X-axis {1,2,....,9,10}. Then draw an OLS line through the scatter cloud, maybe wondering if die roll tends to increases with number of rolls. Now we can refer to two different standard deviations: the first is the "plain old" sample standard deviation of the 10 rolls (a measure of average dispersion from the *average* Y value); second is the SER: a measure of average dispersion from the OLS regression line. For this reason, it is arguably a better measure of "goodness of fit" than the R^2.

And, note their similarity, in a way: if there is no rel between X & Y axis (we would not expect here), the regression line would be flat at the average Y (3.5) and the SER would be nearly the same as the sample standard deviation (the only difference being due to the n-2 denominator rather than n-1).

And, again a tip here: we do not need to be distracted by term "standard error." SE is a standard deviation, only in the context of a variation/dispersion that is created by sampling variation (even if the population is perfectly stable, each random sample can be expected to show differences; even if the population is perfectly fixed, each sample regression function will produce a different OLS such that slope and intercept are themselves random variables).

David
 
Top