What's new

F- Statistic Formula Variations

Thread starter #1
Hi,
@David Harper CFA FRM had indicated in a thread that -" The general form of the F-statistic is F[numerator df, denominator df] = (ESS/df)/(RSS/df) "

F -Statistic is also expressed as = {Sum Of Squares BETWEEN / df-BETWEEN } / { Sum Of Squares WITHIN / df-WITHIN } => This expression of the F-Statistic I conceptually and intuitively understand.

However, there are some other Variations for the F -Statistic Formula
F -Statistic = { ( SSR-Restricted - SSR-UNRestricted ) / No of Restrictions } / (SSR-UNRestricted ) /(N-k-1) which again I understand the conceptually and intuitively.

Also,
F -Statistic = { (R-UNRestricted ^2 - R-Restricted ^ 2 ) /q } / ( 1- R-UNRestricted ^2 ) / (N-k-1)



How does ( SSR-Restricted - SSR-UNRestricted ) translate to (R-UNRestricted ^2 - R-Restricted ^ 2 ) given that in general, R^2 = { 1 - SSR/ (SSR + SSExplained)
 

David Harper CFA FRM

David Harper CFA FRM
Staff member
Subscriber
#2
Hi @gargi.adhikari After @Nicole Seaman posts the F-ratio video to our YouTube section, I will edit the summary below for greater clarity. For the moment, because I don't have much extra bandwidth, I am going to "collect" the first rough draft; we have a useful tag = https://www.bionicturtle.com/forum/tags/f-statistic/ which contains the key prior conversations. Below are three references.

In brief summary:
  • In my opinion, the most relevant (especially for exam purposes) regression F-statistic is explained in my YouTube video below (Reference #3 below). This most relevant is the most basic and is called by S&W the "overall" regression F-statistic because it tests the joint (null) hypothesis that all slope coefficients are zero. And this is typically how we first encounter it: this null posits that jointly β1 = 0 ∩ β2 = 0 ∩ β3 = 0 ∩ ... ∩ βn = 0 for a "total of q restrictions" so the alternative is that "one or more of the restrictions does not hold."
    • This overall regression, as I explain in Reference #1 below, is a special case of S&W's homoskedastic F-stat given by 7.14:
      F = [(R^2 - R^2 restricted)/q] / [(1-R^2)/(n - k unrestricted - 1), but it is the special case where q = k = the number of regressors in the unrestricted regression), such that: F = [(R^2 - 0)/k] / [(1-R^2)/(n - k - 1)] = [(R^2/k)]/[(1-R^2)/(n-k-1)].
    • As my video illustrates (and its XLS demonstrates), this "overall" regression F-statistic is equivalently given by F=(ESS/df)/(RSS/df). That is, the overall F = [(R^2/k)]/[(1-R^2)/(n-k-1)] = (ESS/df)/(RSS/df), where k = ESS(df) = number of slope coefficients (excluding intercept), and as usual RSS(df) = n - k - 1.
  • The "more sophisticated" F test, discussed in my Reference #2 below, involves a joint test not of all the slope coefficients (as above) but rather a joint test of some subset of the coefficients; e.g.., my example below refers to S&W's example regression that has three independent variables (PctEL, STR, and Expn) but only two restrictions (i.e., "the joint null is that both STR and Expn are equal to zero."). This is not an "overall" F test; this is called the homoskedasticity-only F-statistic and it utilizes the latter variants that you have listed. That's just all I have time for now, this is a rough draft meant to pull the conversation together into a coherent whole; later I will refine for a better post, and insertion into the study note. Thanks,
Reference #1 at https://www.bionicturtle.com/forum/threads/f-statistic.7676/ i.e.,
Hi Brian,
It's a smart question :)

The reason I didn't include F = (ESS/df)/(RSS/df) is that I don't think S&W show it. Inexplicably, as that was the more familiar (and intuitive) formula before S&W replaced previous, better econometric readings (I think S&W on F-stat is *weak* and confusing)).

F = (ESS/df)/(RSS/df) is the F stat for the so-called "overall" regression F-stat; i.e., the test of the joint null that all regressors (independent variables) are equal to zero. This is the basic F-stat. We want to note that this is a special case of a restricted regression: the test of joint null that all independents = 0 is equivalent to restricting all of the regression coefficients (i.e., q = number of independent variables). Again, the (common) overall regression F-stat is a special case of a restricted regression where restrictions (q) is set equal to number of independents (which is equal to ESS df).

So this overall F-stat is a special case of S&W's homoskedastic F-stat given by 7.14:
F = [(R^2 - R^2 restricted)/q] / [(1-R^2)/(n - k unrestricted - 1), but the special case where q = k = the number of regressors in the unrestricted regression), such that:
F = [(R^2 - 0)/k] / [(1-R^2)/(n - k - 1)] = [(R^2/k)]/[(1-R^2)/(n-k-1)].

So, as far as I am concerned, there is one general F-stat and the difference is the number of restrictions. I'm not aware that the FRM has ever gone beyond the "overall" regression F-stat. (given this, the t^2 should only be equivalent when the unrestricted regression happens to have two independent variables: in which case, the overall F-stat is a test of the joint null that two regressors are zero). I hope that explains.
Reference #2 at https://www.bionicturtle.com/forum/threads/stock-watson-chap-7.13787/post-58778 i.e.,
Hi @FlorenceCC The F-statistic applies to the test of a joint hypothesis that several regression coefficients are equal to zero, according to the null. See our exhibit below, which replicates S&W's example. This is an regression with three independent variables such that TestScr = b0 + b1*PctEl + b2*Expn + b3*STR. The "overall regression" F-statistic is typically generated by the software; in my Excel below, =LINEST produces this overall regression F-stat = 107.455. But it can also be found with (ESS/df)/(RSS/df) = (66,410/3)/(87,500/416) = 107.455. It is potentially confusing because you might logically say that "this typical F-statistic is testing the joint null hypothesis that all three regressands are equal to zero, which is the special case of three restrictions," but S&W are calling this an unrestricted regression. That is, if we are restricting all of the coefficients (aka, imposing restrictions on all of the coefficients), it is the "unrestricted" regression! (which makes some sense actually)

Then, separately, in the exhibit below, there are homoskedastic-only F-stats = 8.010, calculated per F-stat as function of SSR (like your equation above) and R^2, restricted versus unrestricted. This is following S&W's example. These 8.01 are not a joint test of all three coefficients, but rather a joint test with only two (2) restrictions, q = 2: the joint null is that both STR and Expn are equal to zero. The F-stat of 8.01 uses as inputs either the unrestricted SSR (85,700) or unrestricted R^2 (0.437) which are produced by the overall regression. I hope that helps clarify!


Reference #3 is my recent YouTube video: The F ratio is a test of overall significance in a multivariate regression (FRM T2-20) which uses the regression below to illustrate

That video ("The F ratio is a test of overall significance in a multivariate regression (FRM T2-20)" is located here:
 
Top