FRM Episode #2 - Quant B (Econometrics)
by David Harper, CFA, FRM, CIPM
In this issue
About Episode #2
Episode #2 is called Quant B. This is a review of the first four assigned chapters (chapters one to four) in Essentials of Econometrics. As usual, members can access on the premium page; non-members can sample the first 15 minutes here.
I would like to highlight a few key themes for this episode (Gujarati's first four chapters):
Limitation of the linear regression model and the related importance of the random (or stochastic) error term
The deterministic linear regression model is Y = mX + b. But this model is limited: it does not imply causality (it may be "merely predictive"). And it does not recognize a non-linear relationship. Worse, in the real world, the dependent variable (Y) is almost certainly a function of more than one independent variable (X). Adding the error term is therefore important and changes a deterministic model into a statistical or econometric model: Y = mX + b + error. The error contains everything we don't know about impacts on the dependent variable, which is a lot!
Marginal, Conditional and Multivariate Probabilities
It is good to be clear on these terms because they are used a lot. If you would like a really concrete example, I uploaded Gujarati's example (PC versus printer sales) to a spreadsheet that calculates all possibilities (see spreadsheets below).
To further illustrate the difference, consider a simple basket of two bonds, bond G and bond H. Then ask three questions:
- P(G): What is the probability that bond G will default? This is a marginal (unconditional) probability; we ask (a priori) without information.
- P (G | H): What is the probability that bond G will default given that (conditional on) bond H defaulting. This is a conditional probability. Note if the bonds are independent (i.e., zero default correlation), then bond H doesn't impact bond G and you'll get the same answer as P(G). Mathematically, P(G|H) = P(G,H)/P(H) but independence requires P(G,H) = P(G)(PH), so P(G|H) = P(G)P(H)/P(H) = P(G).
- P(G,H): What is the probability that both bonds jointly default? This is the joint probability.
Distributions: continuous/discrete and density/cumulative
On a superficial level, all distributions have the same job. A mathematical function describes the behavior of a random variable, giving shape to a distribution. The area under the distribution's curve (or step function) equals 1.0. We produce a sample statistic. We use the distribution to judge the statistic: is this a likely outcome (are we near the mean) or unlikely (are we out in the tail?). Why are some distributions complex? Often because we are trying to give more realistic shape to a distribution; e.g., fatter tail, more skew. A lot of the operational risk distributions are complex specifically because they are trying to capture the special fatness of the operational loss tail, so they have parameter complexity.
Note Gujarati's classic distinction between two types of distributions and then, as a different matter, two functional questions about distributions:
- Can we count the random variable (e.g., 1, 2, 3, heads, tails, default, no default) or must we measure the random variable (e.g., what is the distance to default?)? If we can count the random variable, it is discrete (PMF or step function CDF). If we must measure it, it is continuous (PDF or continuous CDF)
- Do we want to know if the random variable will be "less than" a value (CDF) or do we want to know if the random variable will be "equal to" (approximately equal to, in the case of continuous) a certain value (PMF or PDF)?
Note that discrete versus continuous is not a choice, it is a feature of the random variable. The normal, for example, is always continuous; a coin flip is always discrete. On the other hand, regardless of the distribution (discrete or continuous), we can ask either the density/mass question or the cumulative question: what is the probability P(X=x) or what is the probability P(X<=x)?
| P [X=x] or P[x1 <= X <= x2] | P[X<=x] | |
| Discrete | PMF | CDF (step) |
| Continuous | CDF (continuous) |
About random variables: expected value, dispersion (variance), and relationship (covariance)
You need to memorize several formulas in Chapter 3 of Gujarati. These fall into three buckets:
- Expected value; e.g., what's the expected return of an asset?
- Dispersion (variance); e.g., what's the volatility (=SQRT[variance]) of an asset?
- Relationship (covariance and correlation); e.g., what's the correlation between assets?
In my opinion, these below you should know cold as they are used often:
- Variance (X) = E(X^2) - [E(X)]^2. Know this cold. For example, what's the variance of a coin toss (H=1, T=0)? (Answer at end of letter)
- Variance (X+Y) = var(X) + var (Y) + 2Cov(X,Y). This is used in the Investment Discipline (module 5) to calculate the variance of a two-asset portfolio.
- Cov(X,Y) = Correlation (X,Y)*Standard Deviation(X)*Standard Deviation(Y). Know this cold. Solve for the correlation and see we can say about correlation: "correlation is covariance translated by (divided by) the product of volatilities into a unitless measure." Please make sure you can substitute this covariance into the additive variance above (for the expanded two-asset portfolio variance).
Sample versus population 'moments'
I don't have a shortcut for the sample moments (sample mean, sample variance, sample skew and sample kurtosis. Plus sample covariance and sample correlation). You'll note they all tend to divide by (n-1) because (n-1) is degrees of freedom (d.f.). Degrees of freedom are notoriously hard to explain (note confusion behind the wikipedia entry!). Gujarati's explanation on page 93 is the best I've seen.
In regard to the sample statistic, I think of this as taking an "almost average." As in, the population variance is the expected square of deviations (deviations from the average). To get an expected value, we take an average: we sum (the deviations squared) and divide by n. But for a sample, we take an "almost average:" sum the deviations squared and divide by (n-1) instead of (n). Intuitively, we admit our uncertainty with a slightly larger variance.
Distributions: normal, students t, chi-square, and F distribution
You have four distributions to learn but a single shared pattern:
- We estimate a sample mean and/or variance. For example, we can collect Google's mean return and variance over a historical sample of X days; also, Yahoo's mean return and variance over a historical sample of Y days.
- We translate the sample estimate into a random variable (a test statistic). If we are looking for Google's true (population) mean, we standardize the sample mean. If we are looking for Google's true (population) variance, we create a chi-square variable [sample variance / hypothesized population variance]*(degrees of freedom). If we want to decide if Google and Yahoo share the same true (population) variance, we create an F ratio [higher sample variance / lower sample variance].
- We compare the test statistic to the appropriate distribution. The statistical lookup tables (e.g., student's t) give us a value that we do not expect to exceed with x% confidence. If our test statistic exceeds that value, we have an unlikely random variable (we reject the null hypothesis).
Note that both the normal and the student's t distribution are used to test the sample mean against a hypothetical population mean. But we use the student's t when the variance is unknown and the sample is small (if the sample is large, the student's t approximates the normal anyway).
| Portfolio Asset | Test | Test statistic |
| Standard Normal | Sample mean versus (hypothetical) population mean and we know the population variance or large sample | (value - mean) / [standard deviation/SQRT(n)] |
| Student's t | Sample mean versus (hypothetical) population mean but we don't know the population variance | (value - mean)/ [standard deviation/SQRT(n)] |
| Chi-square | Sample variance versus (hypothetical) population variance | [sample variance / hypothetical population variance]*(n-1) |
| F distribution | Ratio of two sample variances: are they from populations with the same variance? | (larger variance)/ (smaller variance) |
New Spreadsheets Added
I uploaded several new EditGrid spreadsheets to the member area. These "learning worksheets" that can be accessed in three ways:
- Simply view in the browser,
- Open into MS Excel: Select File > Export As > Excel (.xls), or
- Most have a downloadable "native" Excel file (XLS) associated with the entry.
For this episode #2, I uploaded the following spreadsheets:
- Gujarati's marginal versus conditional versus joint probability: I reproduced his example (Table 2-3 on PCs and Printers sold) and illustrated the difference between these three. A simple spreadsheet to illustrate basic but important ideas.
- Sample Skew and Sample Kurtosis: Dataset is Google's daily returns in 2007 (i.e., a series of 251 observations). Illustrates the calculation of sample skew and sample kurtosis.
- Student's t distribution: Dataset is a small sample of Google' stock returns (ten trading days). Then ask, is the (population) mean return really different than zero?
- Chi-square distribution: Dataset is an almost-small sample of Google's daily price returns (30 days). Calculate the sample variance and ask, could the population variance be 0.02%?
- F distribution: Dataset is sample variance for Google and sample variance for Yahoo. Then ask, could their respective population variances be the same?
But, as I wrote last time, the spreadsheets are not required study. Nor I do not recommend them for your "first pass" if you are just getting your feet wet in the FRM. As I mentioned in the Episode, I hope they may be helpful in some cases, where you seek a concrete clarifications.
Screencast Tutorial
Paid member access the screencast in the member section. In addition to the viewable screencast:
- You can downloadable the underlying PowerPoint slides (in PDF format)
- An ipod format (.m4v)
- A downloadable version of the screencast in a .zip file. (Save to new directory on local and launch the .html file.)
Non-members can sample the start of the screencast tutorial here.
Practice Questions
I wrote the following (new) questions to engage you in this episode. My answers are located here in the forum. These are not really exam-types questions as exam-type quesitons tend to be very specific (except the Bayes' Formula question. That's typical). Instead, I hope these may provoke you to think about the episode and, in some cases, draw connections to other areas of the FRM curriculum.
Question #1:
In the Wilmott reading (Chapter 22 VaR), the Generalized Pareto distribution (GPD) is used to approximate losses that exceed some threshold (peaks over threshold).
Is the GPD a PMF, PDF or CDF?
Question #2:
A priori, assume the odds of a recession next year are 25% (and 75% that there will be no recession, so only two outcomes). If the economy does NOT go into a recession, the likelihood that bond XYZ will default is only 1.0% (therefore, there is a 99% probability of no default). If on the other hand the economy DOES descend into recession, the likelihood bond XYZ will default increases to fully 9.0%. At the end of the year, we observe the bond defaulted.
What is the POSTERIOR probability that the economy went into recession? (Using Bayes Theorem)
Question #3:
In Operational Risk, we consider the Loss Distribution Approach (LDA) which generates a total loss distribution by combining (compounding) two different distributions: a distribution that characterizes the FREQUENCY of operational losses (e.g., how many times do we exceed some loss level during a certain period) and another distribution that characterizes the SEVERITY of operational losses (what is the amount of the loss?). Are these distributions likely to be discrete or continuous?
Question #4:
Assume a portfolio with only two bonds. The probability that obligor A will default is F(x) = u. Separately, the probability that obligor B will default is F(y) = v. Further, assume the two credits are INDEPENDENT; i.e., they have a default correlation of zero?
- (i) What does Gujarati call F(x) and F(y)? PMF, PDF, or CDF
- (ii) Using this two-bond example, illustrate the idea of a CONDITIONAL PROBABILITY.
- (iii) If we want to estimate the probability that both credits will simultaneously default (i.e., we purchased a 2nd-to-default swap), what is that probability called and what is the notation?
- (iv) What does their independence imply about the joint probability of default? (as a function of the marginal probabilities)
- (v) Tough: Wikipedia says about copulas : "The theorem proposed by Sklar underlies most applications of the copula. Sklar's theorem states that given a joint distribution function H for p variables, and respective marginal distribution functions, there exists a copula C such that the copula binds the margins to give the joint distribution." What is the copula function in this example, where the default correlation is zero.
Question #5:
Assume we manage to characterize a portfolio as a linear combination of risk factors that are each normally distributed. For example, our portfolio return = aX1 + bX2 + cX3 ... where X1, X2, etc are normally distributed random variables.
- (i) Do we typically describe parametric value at risk (VaR) in terms of a PDF, PMF, and/or CDF?
- (ii) Can we say anything about the distribution of the portfolio? Under what condition(s)?
Question #6:
Let's say we find a good non-normal, fat-tailed distribution that characterizes an asset's returns.
- (i) What is the meaning of the sampling distribution of the sample mean?
- (ii) Is there anything we can say about it?
- (iii) Why might this not necessarily help us, from a risk perspective?
Question #7:
Say we draw a sample of Google's daily periodic stock returns. We also draw a sample of Yahoo's daily periodic price returns. In regard to Google's historical sample, let Gm, Gv, Gn be the sample mean(Gm), the sample variance (Gv) and the sample size (Gn). In regard to Yahoo, let Ym, Yv, and Yn refer to Yahoo's sample mean (Ym), sample variance (Yv) and sample size(Yn).
For each scenario below, what is the correct distribution and what is the test statistic?
- (i) We want to test if Google's population mean is different than zero?
- (ii) We want to test if Google's population variance is 0.1%?
- (iii) We want to test if our observed sample variances (i.e., Google's Gv and Yahoo's Yv) are drawn from populations with identical variances?
Question #8:
Among the student's t, the chi-squared and the F distributions:
- (i) Under what circumstances do they converge to the normal distribution?
- (ii) Under what circumstances do they converge to the standard normal distribution?
Question #9:
Assume we run a Monte Carlo simulation (Willmott reading) to simulate the price path of a single stock. Our algorithm assumes the stock's returns are normally distributed (we will use Geometric Brownian Motion, GBM) and we will run a single simulation of 100 trials. The simulation will produce a distribution of final outcomes: simulated stock price levels at a future date. Must this simulated distribution (of future price levels) match any distribution, and if so, which?
Question #10:
In the FRM, we are repeatedly reminded that asset returns are likely to exhibit "fat-tails" (kurtosis > 3. In addition to being skewed and unstable).
If fat-tails are true, it is dangerous to depend on the normal distribution to model returns (see Taleb's Black Swan). Several remedies (or solutions or compensations) are suggested in the curriculum.
In short, as Linda Allen writes, normality cannot be salvaged. Cite a few solutions to this fat-tail problem that we've already seen in the curriculum.
My answers to these questions (maybe you can do better!)
- Question 1
- Question 2
- Question 3
- Question 4
- Question 5
- Question 6
- Question 7
- Question 8
- Question 9
- Question 10
Thanks very much. I will write again in two weeks with another newsletter.
David Harper, CFA, FRM, CIPM
Founder
www.bionicturtle.com
P.S. I don't want to spam you! Please click here to unsubscribe and we won't bug you...
The BT FRM Exam Prep Program is the most effective (and by far the most affordable) way to increase your odds for exam success. A total exam assistant with multiple modes of rich media learning.
Please see this page for more details.
- Thanks David for putting this together. Would it be OK also to give us answers to the questions at the back of each Hull's chapter (or … 03 Jul 2009
- Hi Jack - Sure thing, glad you find useful...more like this on the way! David 02 Jul 2009
- Hello David, Thank you very much for putting together this practice bag! 02 Jul 2009
- Hello David, I’ve seen the terms “covariance matrix” and “correlation matrix” a couple of times now, and I think I roughly know what they are and …03 Jul 2009
- Hi David I believe It is one of the most basic doubt while going through the readings of Gujarati .I would like to know the intuition …03 Jul 2009
- Hello David, I think this is a really simple question but if EVT tries to “zoom in” on the LFHS loss events, and we are dealing …03 Jul 2009
Comments
I forgot to answer my own question about the variance of a coin toss, where Heads = 1, Tails = 0.
That means X = {0,1} and X^2 = {0,1}
E[X^2] = average {0,1} = 0.5
E[X] = 0.5
So, E(X^2) - [E(X)]^2 = 0.5 - [0.5]^2 = 0.25
Thankfully, Excel agrees: =VARP(0,1) = 0.25. Because it’s a population variance. The sample variance is =VAR(0,1) = 0.5
I want to appreciate David Harper on his efforts to educate us and I would say kept up
Leave a Comment