What's new

# P1.T2.716. Central limit theorem and mixture distributions (Miller Ch.4)

#### David Harper CFA FRM

##### David Harper CFA FRM
Staff member
Subscriber
Learning objectives: Describe the central limit theorem and the implications it has when combining independent and identically distributed (i.i.d.) random variables. Describe i.i.d. random variables and the implications of the i.i.d. assumption when combining random variables. Describe a mixture distribution and explain the creation and characteristics of mixture distributions.

Questions:

716.1. Your colleague Patricia is conducting a regression analysis based on a large sample (N > 30) from her bank's customer database.
• The dependent variable (aka, regressand) is the customer's FICO credit score
• The independent variable (aka, regressor) is an internal composite score based on the customer's education level and other factors
Her classical linear regression model (CLRM) is therefore given by FICO(i) = β(0) + β(1)*SCORE(i) + u(i). She generates her sample regression function (SRF) from a large random sample of size N where the population presumably has a mean, µ, and finite variance, σ^2. For example, her sample dependent values are FICO(1), FICO(2), FICO(3), ..., FICO(N). We can assume her random selections are identically and independently distributed (i.i.d.).

Each of the following statements is true EXCEPT which is false?

a. According to the central limit theorem (CLT), as the sample size increases (ie, as N -> ∞), the sample average of the FICO scores will itself tend to follow a normal distribution
b. According to the CLT, the intercept, β(0), and slope, β(1), estimators in her regression should follow an approximately normal distribution
c. If the other assumptions of CLRM are valid, including that the error term has a conditional mean of zero and constant variance (i.e., homoskedastic), then the error terms are approximately normal
d. If the FICO score--which happens to be the dependent variable (aka, regressand) in the regression--is positively or negatively skewed, then the distribution of its own sample mean will be skewed even for a large sample; and further, this will violate an CLRM assumption if we regress it against the internal composite score

716.2. Your colleague Peter is selecting probability distributions in order to perform several Monte Carlo simulations. Each of the following choices appears to be logical or sensible, EXCEPT which choice prima facie appears be a mistake?

a. To model recovery rates for high yield bonds, he selects a beta distribution
b. To model a continuous variable that is non-negative, right-skewed and tends toward the normal distribution as the degrees of freedom (d.f.) increase, he selects either a chi-square or F-distribution
c . To model a light-tailed distribution (aka, platykurtosis where kurtosis < 3) he selects either a Poisson distribution or a student's t distribution whose degrees of freedom (df) is less than 30
d. To model a continuous outcome within a finite range (a,b) using a distribution that is only slightly more complex than a uniform distribution, but allows him to specify a unique mode, he selects a triangular distribution with parameter (c) equal to the mode

716.3. Plotted below is a normal mixture distribution with the following two components:
• The first component is normal distribution with mean of zero and standard deviation of 2.0 that is assigned a weight of 40.0%;
• The second component is a normal distribution with mean of 2.0 and standard deviation of 1.0 that is assigned a weight of 60.0% In this way, the mixed random variable (let's call it 'W') has a density given by W = 0.40*X + 0.60*Y where X ~ N(0, 2^2) and Y ~ N(2, 1^2). Which is nearest to the probability that the mixed random variable will be less than zero?