Jan 28

FRM 2008 Early Bird Episode #3

by David Harper, CFA, FRM, CIPM


FRM | CFA |

earlyBird_week3_Thumb150w

Contents

  • Learning Objectives
  • Video tutorial
  • Practice Questions (3)

Learning Objectives

regressionTurtlePic

Hello from Bionic Turtle! This week's Early Bird episode introduces linear regression. Your "homework" is to view the 20-minute tutorial and, more importantly, to work the three practice questions below.

In this episode, we review the following ideas:

  • The regression line is given by: y = mx + b + e (error)
  • (m) is slope, (b) is intercept and the error term (error) reminds us that actual data does not perfectly fit the regression line
  • How does the regression line fit the data? By way of ordinary least squares (OLS), the lowest (minimum) value is produced for the sum of squared errors (SSE)
  • In the tutorial, we break the regression (decompose) into two "pieces:" the sum of squared regression (SSR) and the sum of squared errors (SSE). That is, SSR + SSE = SST (sum of squared total).
  • The coefficient of determination (R-squared) = SSR/SST. This coefficient of determination is square of the correlation coefficient (r).
  • The standard error of the estimate (SEE) is basically the standard deviation of the error term: it quantifies the dispersion of observations above/below the regression line.
  • The difference between SSE & SEE? SSE is the larger number that's difficult to interpret. SEE is the smaller number, that we can visualize (in my tutorial example, 1.5 units along the y-axis). Finally, SEE = SQRT [SSE/(n-2)]. In fact, if you look closely, you'll see it's pretty nearly a standard deviation calculation.
  • Both R-squared and SEE tell us something about the fit of the line to the data: R-squared tells us how much of variation in the dependent variable is explained by the independent variable. The standard error of the estimate (SEE) is a measure of the error (residual) dispersion.
  • Finally, this is just the second step (the first step: look at the scatterplot to see if a linear relationship exists). This is not yet significance testing. Significance testing entails questions like, "are the coefficients meaningfully nonzero or could they be random, given the sample size?" After we produce the parameter estimates (e.g., slope, intercept) and regression coefficients, it is still an additional step to test for significance.

Video Tutorial

eb_linearRegress_1

The 30-minute video tutorial is located here (with a table of contents).

If you are a paid member, you can also access this in the member section (where you will also find the downloadable slides, if you would like to view those. As well as an ipod format file.)

Practice Questions

Here are three practice questions. Again, I developed these especially to provoke learning. These each take some work but I think they will really help you to practice some key concepts!

Question #1 (historical correlation)

goldOil

Last week, we observed the following (hypothetical) daily close prices for oil and gold (in Euros):

  • Gold (5-day series): €15, 16, 17, 18, 19
  • Oil (5-day series): €50, 60, 55, 50, 60

For example, on Monday, gold closed at 15 Euros/gram and oil closed at 50 Euros per barrel.

(i) For this five-day series, what is the covariance between oil and gold?
(ii) What is the population standard deviation of the gold series?
(iii) What is the sample standard deviation of the gold series?
(iv) Based on a sample covariance and sample standard deviations, what is the correlation (coefficient) between gold and oil prices?
(v) Bonus: if you instead used population covariance and population standard deviations, would the correlation be lower or higher?
(vi) Bonus: if you square the correlation, what would that number be called and what does it mean?

 

Question #2 (probabilistic correlation)

googyhoo

Next year, the economy will be in one of three states: low growth (30% likely), moderate growth (60% likely) and high growth (10% likely). Assume both Google's (G) and Yahoo's (Y) sales are a function of the economic state, such that:

  • If economic growth is low (30% likely), Google (G) will grow 20% and Yahoo (Y) will grow 4%
  • If economic growth is moderate (60% likely), Google will grow 30% and Yahoo will grow 4%
  • If economic growth is high (10% likely), Google will grow 60% and Yahoo will grow 10%

(i) What is Google's expected growth?
(ii) What is the covariance between Google (G) and Yahoo (Y)?
(iii) What is Google's variance?
(iv) What is the correlation (coefficient) between Google and Yahoo?

Question #3 (Linear regression)

hfriRegress

We regressed the monthly returns of two HFRI Hedge Fund Indices (actual data as of Jan 2008, not hypothetical): the Equity Hedge Index (E) and the Distressed Securities Index (D). We will call the Equity Hedge Index (E) the independent variable; and we will call Distressed Securities Index (D) the dependent variable.

Here were the results of the analysis for the five-year period ending Dec 2008:

  • n = 60 (sample size of 60 monthly returns, or 5 years)
  • Covariance (E,D) = 0.00013 (or 0.013%)
  • Variance (E, independent) = 0.00026 or 0.026%
  • Variance (D, dependent) = 0.00014 or 0.014%
  • y-intercept (D-intercept) = 0.0072 or 0.72% (if y=mx + b is the linear regression, then b = 0.6%)

(i) What is the slope of the regression line. Put another way, what is (m) in the equation y = mx + b (or D = mE + b)?
(ii) If the independent variable (E) equals 5%, what does the regression line predict for the dependent variable (D)?
(iii) What is the correlation coefficient between E and D?
(iv) What is the coefficient of determination (r^2) for the regression line?
(v) Bonus: What is another way to calculate the coefficient of determination if you are given, for example, that both SSE and SSR equal 0.004?
(vi) Bonus: Is the correlation coefficient (r) significant?
(vii) Gold star bonus (bionic turtle tough!): If the sum of squared errors (SSE) is 0.004, what is a 95% confidence interval that bounds the projected dependent variable (D) if E equals 5%?

Answers to practice questions

That's all for this week. Good luck and see you next week!

David Harper, CFA, FRM, CIPM
Founder

David-Harper_100w
www.bionicturtle.com


Comments

  1. question no 2 is linked to answer 3. I want to know answers to question no 2.

    Thanks

  2. Neha - I fixed it. FYI, all Early Bird answers are itemized at this forum thread

  3. Hi David

    I would like to make a small update to the question no.2 of this tutorial. The Question states different values for the economic growth as (30, 50 and 20) and the answer has been resolved with values (30,60 and 10). It does not makes any different at this point since we are learning the concepts and did peek in to check the answer.

    Hope I am not giving you too much of troubles.

    Sorry I am late on this! I was revisiting the practice exercises.

    Regards
    Neha

  4. Hi Neha,

    You are correct, again. It is the opposite of trouble, it is very helpful! Thank you, David

  5. Hi David

    I noticed that the (vii) part of the Question 3 of this episode mentions 95% confidence interval whereas in the answer section we have used 97.5% confidence interval.

    Though you have inputed the correct value for 95% CI which is 1.96 I checked it in the Episode 2. Your answer is correct.

    Is it possible to edit the same.
    From:
    Lower: D (L) = mx + b - [SEE][NORMSINV(97.5%)], and
    Upper: D (U) = mx + b + [SEE][NORMSINV(97.5%)]

    To

    Lower: D (L) = mx + b - [SEE][NORMSINV(95%)], and
    Upper: D (U) = mx + b + [SEE][NORMSINV(95%)]

    Thanks for your help.

    Regards

    Neha

  6. Hi David

    Recently I raised a query on Question 3 part vii , I am all set now. My understanding is that you have inputed the CI as 97.5 as a one tailed test.

    Thanks

    Neha

  7. Hi Neha,

    I am glad it’s not an error smile

    But you do raise a good point. I try to distinguish between:
    confidence INTERVAL (implies two-tailed) and
    confidence LEVEL (implies one-tailed)

    such that, as above,
    95% CI = 97.5% CL for a symmetrical distribution

    But not all readings, however, seem to make this distinction (some refer to a VaR confidence interval which doesn’t quite sound right, to me). But Jorion is, as usual, good about this. In reference to VaR, he is careful to speak of confidence LEVELS not intervals.

    Thanks, David

Leave a Comment