What's new

Credit scoring


Thread starter #1
Hi @David Harper CFA FRM

I am not able to understand below graph completely. I do understand the curve of the model and why it is upward sloping but at a decreasing rate. The curve of our model is increasing at a decreasing rate. This is so because it's cumulative distribution curve. If we increase 1% from of 9 to 10, which falls in high-risk percentage zone, a fraction of defaults will be more as compared to when we move same 1% from 90 to 91 which falls in the low risk zone. Thus a cumulative increase in the high zone will be more as compared to low risk zone hence the curve of the model.

What I didn't understand is " assume that the scoring model predicts that 10 percent of the accounts will default in the next 12 months" Where is the time component in this graph?

If the blue line represents a random model, the gray line represents our current model and does the red line represents our perfect model? if not how will the perfect model line look like?

In a perfect model, if we move from 0% to 10% in x-axis then the curve will be such that at 10% of x-axis it will show the corresponding 10% of y-axis?

How will the curve of the perfect model look?

credit scoring question.png

David Harper CFA FRM

David Harper CFA FRM (test)
Staff member
H @Jaskarn Because it's not easy to understand the CAP, I started to build a version in Excel so we can add to our learning XLS library (and include in the materials, of course). The first draft is very basic but you can see here https://www.dropbox.com/s/a1y2o7n4rr3k731/021519-cumulative-accuracy-ratio.xlsx?dl=0 and a screenshot is below (I just noticed that misspelled Cumulative, btw)

I think it may be easier to understand with a concrete example, and I have attempted here the simplest example that I can think of. Please note that time is not a dimension in the CAP. My dataset includes a nice round 100 borrowers (1 per row, starting at ID = 1 and going to ID = 100). Then I have sorted them and very unrealistically, but convenient to my illustration, their credit scores happen to be in sequence. The actual model would sort them, so the sorting isn't unrealistic, rather just that my borrower happen to have credit scores of 660, 661, 662, .... Consistent with Crouhy, i selected a "cutoff" at 680. So, the model predicts that the borrower will default (i.e., Pred = 1) if their credit score is less than 680, or it predicts non-default (i.e., Pred = 0) if their credit score is 680 or greater.

The final three columns in my model are:
  • The first column Model Prediction: If Score < 680, value = 1; otherwise value = 0
  • The second column represents a Perfect Model: value equals same as Model Prediction
  • The third column ("Rand") is a random outcome: =INT(RAND()*2), which generates a random 0 or 1.
Then the CAP chart plots:
  • The X-axis is simply (ID/100) because the X axis per Crouhy represents "On the horizontal axis are the population sorted by score from the highest risk score to the lowest risk score." I have 100 borrowers so mine is conveniently 1%, 2%, ... If there were 500 borrowers they would be sorted as the X-axis values would be 1/500 = 0.20%, 2/500 = 0.40%, .... 499/500, 500/500.
  • The Y-axis is given by (the number of predicted defaults / number of total defaults) per Crouhy's not-very-well-at-all-explained "On the vertical axis are the actual defaults in percentage terms taken from the bank’s records." For example, when we get to the 10th borrower in my random simulation, one my random trials has a value of 8% because in that trial 8 of the worst 10 borrowers actually defaulted, among 52 total that actually defaulted. So this is the cumulative percentage that have actually defaulted.
  • We do not expect my random simulation (in red) to exactly correspond to the 45 degree line, which represents an unrealistically "perfectly bad" model. In a perfectly bad model, when we get to the (sorted) worst 10 borrowers (10/100 = 10%), our model predicts they all default but only half of them actually default (which is 5 / 50 or 10%).
Once these axes are understood, I think the ratio is a cinch. Let me know what you think, thanks,
Last edited: