Sign up in less than a minute. Join now!

FREE VERSION | JOIN NOW!

remember me

forgot password?
30 Mar

Early Bird Webinar #5 Follow-up [webinar]

by David Harper, CFA, FRM, CIPM

image

Thank you to the 92 attendees on Saturday’s live webinar (our final 2009 FRM Early Bird). On the paid member page, I just uploaded:

  • A recording of the session (2 hours plus about 25 minutes of Q&A at the end). We don’t have much control over the recorded webinar format: it is windows media (.wmv). Please note you need the codec.
  • The downloadable PDF of the PowerPoint presentation
  • The annotated spreadsheet, that I used, will be uploaded by end of week

In illustrating the linear regression via minimization of RSS (near the end), I used a spreadsheet from Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel (Fatena asked about this book).

As an introduction to Gujarati’s inference and linear regression, I hope that we successfully reviewed the following ideas:

Sample estimators are random variables

We talked about the tricky idea of a sampling distribution.  Recall, in theory:

  • There is one unknowable population with parameters (e.g., population mean)
  • If I draw a random (i.i.d.) sample, I will get a different sample than you, if you draw your own sample. One population, but many samples. For this reason, there is sampling variation and the sample estimators are random variables (unlike the population parameters).
  • In Gujarati’s example 5.1 (the P/E ratios of 28 NYSE companies), the sample mean of 23.25 has two roles. See graphic below. Notice, first, that the population parameters (in Greek) are unknown. We draw samples to infer them. Then 23.25 is both a descriptive statistic (X bar) and an estimate (X hat) of the population parameter (an estimate is the value produced by the estimator which has been called a “recipe”)
  • You must understand why the sample mean is asymptotically normal (below, top left): central limit theorem (CLT). As discussed, I think it takes some mental work to grasp the CLT, but it’s key. It explains the undeniable strength of the normal (as a body measure, not a tail measure) and it explains why student’s t statistics are used in (virtually) all tests of the regression coefficients.

captured_Image.png

Three ways to infer

image

We used Gujarati problem 5.14 to explore three methods to infer (hypothesis test). Recall, given a sample mean of 8 and a hypothetical population mean of 7.5 (i.e., null: mean = 7.5), we can infer three ways:

  • Compute the confidence intervals
  • Compare the test statistic to the critical “lookup” value
  • Look at the p value. The advantage of the p value is that we do not need to select a confidence/significance level. The p value is called the exact significance level (I prefer marginal significance level). It is the lowest significance level  where we can reject the null (i.e., call it significant). But here is how I like to use it: “We can reject the null with (1-p) confidence.”

And, again, why are we using the student’s for a distribution that is asymptotically normal per the CLT? Because we don’t know (are not given) the population variance. Somebody wisely asked how much different the two distributions are? As shown above (red=student’s t, green = normal) with a small sample of 25 (so d.f. = 24 as we consume one independent observation computing the sample mean, that, in turn, is used to compute the test statistic), they are remarkably similar. Put another way, the student’s t will always technically exhibit leptokurtosis (>3) but it is only technically fat tailed; it is unlikely to be used as a genuine “heavy-tailed” distribution.

Beta (slope of two-variable regression line) is same as the hedge ratio

I hope you can dwell on this formula below as it includes some fundamental relationships:

  • You must be able to translate covariance into correlation and vice-versa (yellow highlight(
  • Slope (beta) can be expressed as correlation multiplied by relative volatility (spot volatility/futures volatility)

captured_Image.png[7]

OLS Linear Regression is the approach that minimizes RSS

We saw that ordinary least squares (OLS) gives a “best fit” line by minimizing the residual sum of squares (RSS). James wondered, why can’t we use the sum of absolute residuals instead of squaring the residuals? And, we could! There isn’t anything magically superior to OLS except that per the CLRM it produces estimators that are BLUE (see Gauss-Markov Chapter 7)

image

The really wonderful chart, IMO, is the residual plot:

image

Note this it gives a visual verification of several OLS properties related to the residuals:

  • Average residual is zero (how can that be? the intercept is solved to achieve this…)
  • No systematic pattern change in the dispersion; i.e., residuals are homoskedastic
  • The residuals are not auto (serial) correlated
  • The residuals are independent of X (the explanatory variable)

Comments

  1. $199 for half the material/level1 ; when will that deal be available online you think?thanks

    Indy M.

  2. Thank you David,
    for the lesson last saturday—on overview on regression.
    As a follow up, I read Schaum’s outline series’ Probability and Statistics- chapter6 and chapter 8. I donot yet have access to Gujarati’s Econometrics, the book you referred to in the presentation. Just wondering what else this book could add. Basics are well covered in the Schaum series book anyway. Can you comment, if Gujarati is an absolute must?

Leave a Comment