Question about Bionic Turtle's 2009 FRM Program
07 Jan 2009
Learn Finance with the pros. Better articles, resources and screencasts for easier learning.
FRM |
Welcome recent members! In case you do not have ready access to them:
Episode #3 is called Quant C. It reviews the final four assigned chapters (chapters five through eight) in Essentials of Econometrics. As usual, members can access on the premium page; non-members can sample the first 10 minutes or so here.
I would like to highlight just a few things about this episode, in case you find my highlights helpful:
The focus is on three distributions:
Student's t: used to test the sample mean against a hypothetical population mean ("our sample mean is X, but what is the true mean?"). Why do we use the student's t instead of the normal? Because we don't know the population variance. For large samples, why doesn't it matter? Because as n increases, the student's t converges with the normal anyhow.
Chi-squared: used to test sample variance against a hypothetical population variance ("our sample variance is X, but what is the true variance?").
F distribution: we have two uses for the F distribution. One, to identify whether two sample variances come from the same population (or populations with the same variance). Two, to test the joint (null) hypothesis that all partial slope coefficients in a multiple regression are insignificant.
The regression is easier to follow, I think, if you take a moment to hover on the definitions. See slide below.
The PRF is the "ideal" that we never observe; e.g., we do not have the time/resources to collect the entire population. Without the random error term (u), this PRF is merely a mathematical model. But we know reality is not so exact (deterministic). So, add the (u) and we have the econometric (or statistical) PRF model. But it is still the unobserved ideal. We are trying to discern it...
So we draw samples. That gets us the SRF which is a straight line. Draw another sample and we will get a different SRF (that's why the estimators have standard errors, because each sample will give a different estimator!). But actual observations do not fall on the line. The difference between the actual observations and the regression line is the residual (e). Notice how the a prior stochastic error term (u) corresponds to the observed, a posteriori residual (e).
Finally, we want to know the population parameters (capital B1 and capital B2) but we are forced to used the sample to "guess at them" with the estimators (small b1 and small b2).
The sample regression formula (SRF) produces coefficients. As below, in the case of Gujarati's lotto example, the SRF returns an intercept and a slope coefficient.
| Coefficients | Standard error | |
| Intercept | -3152.7 | 513.5 |
| X Variable | 25.4 | 3.3 |
Each coefficient has a standard error. This standard error is the standard deviation of the coefficient (i.e., the sampling distribution of). The construction of the confidence interval is always the same:
[sample mean] +/- [standard error][critical t value]
Much of the regression material is variation on this idea. The critical t is a function of confidence and sample size. More confidence (less) implies a wider (narrower) interval; a smaller (larger) sample implies a wider (narrower) interval. A sample of 10 at 95% confident gives a critical t of 2.26. In other words, a "standard" student's t random variable would fall within 2.26 standard deviations 95% of the time. We only need to un-standardize (convert to our units) by multiplying by the standard error.
The regression also gives us the sum of squares:
| df | SS | |
| Explained (ESS) | 1 | 1,504,991 |
| Residual (RSS) | 10 | 256,701 |
| Total (TSS) | 11 | 1,761,692 |
We care about this for two reasons:
One, the coefficient of determination (R^2) is ESS/TSS. And, you can see the formula pretty much tells you what R^2 defines: the degree of variation in the dependent explained by the independent.
Two, to connect the ESS to the standard error of the regression (SER; but this is also called the standard error of the estimate. If you sat for the FRM last year, this was called the SEE). Specifically: SER = SQRT[RSS/(n-2)]
I uploaded new EditGrid spreadsheets to the member area. Somebody asked how to get MS Excel versions; you can open these workbooks directly into Excel! These "learning worksheets" can be accessed in three ways.
For episode #3, I uploaded the following spreadsheets:
Paid member access the screencast in the member section. In addition to the viewable screencast:
Non-members can sample the start of the screencast tutorial here.
I wrote the following questions to keep you engaged in this episode. Questions 3 and 4 are special because I took Gujarati's datasets and annotated the answers in the EditGrid spreadsheet. You will do me a big favor to engage with the worksheets. I believe they may help put you on the path to regression mastery. How will you learn about the confidence interval? To read about it passively or to try and construct it yourself?
You sample monthly hedge fund returns for one year (n=12) and find a sample mean of +1.0% with sample variance of 0.01% (0.0001). You want to determine the population mean.
(i) What determines which distribution we use?
(ii) What is the 95% confidence interval?
(iii) Our null hypothesis is "the population mean = 0." What is our verdict?
(iv) Which error can we commit, Type I, II or both?
(v) If we observed the same sample mean and variance, but for n=48, what is the difference to the confidence interval?
Based on a sample of twenty (n=20 months), we find the sample volatility of a fund to be 25%. The manager claims the "true" (population) volatility is only 20%.
(i) What is the p value?
(ii) Translate the p value into a statement about accepting/rejecting the null hypothesis.
I loaded Gujarati's Table 6-10 into an EditGrid spreadsheet. Please view by clicking here. (The spreadsheet has two tabs: the first tab contains the questions and the regression dataset; the second tab contains the answers)
For 12 years (1990 to 2001), we will regress the S&P 500 Index (the dependent variable) on consumer price index (CPI; the independent or explanatory variable).
The proposed two-variable linear regression model is therefore:
[S&P] = B1 + B2[CPI] + u
(i) What is the SRF?
(ii) Produce a 95% confidence interval (CI) for the slope coefficient.
(iii) Calculate the R^2 (coefficient of determination) using the sum of squares. What does it mean?
(iv) Calculate the standard error of the regression (a.k.a., standard error of estimate). What does it mean?
(v) Tough: Given a CPI of 180 (i.e., independent variable = 180), what is the 95% confidence interval around the PREDICTED S&P value (the predicted Y)?
I loaded Gujarati's Table 11-7 into an EditGrid spreadsheet. Please view by clicking here. (the second tab contains the answers.)
There are 28 observations (years 1954 to 1981). The proposed regression model has three variables. Specifically, annual STOCK RETURNS (Y, the dependent variable) are regressed on OUTPUT GROWTH (X2, an independent variable) and INFLATION (X3, another independent variable)
(i) Given the SRF, what is the first derivative with respect to inflation?
(ii) Our hypothesis is that the output growth parameter is six (6). What is the p value of the test? Interpret the p value.
(iii) Conduct a joint hypothesis test: are both partial slope coefficients statistically significant?
(iv) What is the adjusted R^2 and why is it better than the unadjusted R^2?
Thanks very much. I will write again in two weeks with another newsletter.
David Harper, CFA, FRM, CIPM
Founder
www.bionicturtle.com
P.S. I don't want to spam you!
07 Jan 2009
05 Jan 2009
04 Jan 2009
Comments
Be the first to leave a comment!
Leave a Comment