What's new

P1.T2.505. Model selection criteria (Diebold)

Nicole Seaman

Director of FRM Operations
Staff member
Learning outcomes: Define mean squared error (MSE) and explain the implications of MSE in model selection. Explain how to reduce the bias associated with MSE and similar measures. Compare and evaluate model selection criteria, including s^2, the Akaike information criterion (AIC), and the Schwarz information criterion (SIC). Explain the necessary conditions for a model selection criterion to demonstrate consistency.


505.1. Suppose a trend model over one hundred observations has eight parameters and its sum of squared residuals is equal to 1,435; i.e., T = 100, k = 8, SSR (aka, residual sum of squares) = 1,435. Let's define "corrected MSE" as the mean squared error (MSE) that is penalized for degrees of freedom used. Which are nearest, respectively, to (i) the mean squared error (MSE), (ii) the corrected MSE, and (iii) the standard error of the regression, SER?

a. MSE = 13.29; corrected MSE = 14.35; SER = 3.95
b. MSE = 14.35; corrected MSE = 15.60; SER = 3.95
c. MSE = 14.35; corrected MSE = 26.43; SER = NA (not available with information given)
d. MSE = 179.38; corrected MSE = 14.35; SER = NA

505.2. Consider the following conditions for a model selection criteria to demonstrate consistency:

I. When the true model (that is, the data-generating process, or DGP) is among the models considered, the probability of selecting the true DGP approaches one as the sample size gets large, and
II. When the true model is not among those considered, so that it’s impossible to select the true data-generating process (DGP), the probability of selecting the best approximation to the true DGP approaches one as the sample size gets large

Which of these conditions is (are) true?

a. Neither
b. I. only
c. II. only
d. Both

505.3. Consider the fitting of a polynomial trend model with (p) powers of time, T(t) = B(0) + B(1)*TIME(t) + B(2)*TIME(t)^2 + B(p)*TIME(t)^p. Each of the following is true EXCEPT which is not?

a. As we include higher powers of time, the sum of squared residuals can’t rise, because the estimated parameters are explicitly chosen to minimize the sum of squared residuals; therefore, the more variables we include in a forecasting model, the lower the sum of squared residuals will be, and therefore the lower MSE will be, and the higher R^2 will be
b. The mean squared error (MSE) is a biased estimator of out-of-sample 1-step-ahead prediction error variance: the reduction in mean squared error (MSE) as higher powers of time are included in the model occurs even if they are, in fact, of no use in forecasting the variable of interest
c. While MSE and SIC are inconsistent, AIC is inconsistent
d. In-sample overfitting (aka, data mining) refers to the idea that including more variables in a forecasting model won’t necessarily improve its out-of-sample forecasting performance, although it will improve the model’s fit on historical data.

Answers here: