How can we Predict Outcome Events? Time for Answers!
By Soumyadip Pal
A good way to answer such a question would be using a decision tree - a flowchart with decision points at various levels and their possible consequences. They are essentially classification techniques that predict outcomes based on the various decision points.
While this approach is quite helpful in many situations, there are also problems and unanswered questions. Let us consider two people who both end up with “Good” classification as per the loan risk decision tree. Can we assume that both would be absolutely identical in their risk profiles? Can one be “more risky”, and another “less risky”, though both are classed as “Good”?
Similarly, going back to my question on “Will it rain here tomorrow?”, a “no rain” could end up with you getting drenched because you didn’t carry an umbrella. The model predicted a “no” because the predicted value was just a shade lower than the threshold for “yes”.
You’ll agree that it would be more helpful to not only get binary outcomes as predictions, but also the probabilities associated with each. A “No rain” could mean 0% rain, or 10% rain; however, with the knowledge of the probabilities, you can decide for yourself whether you want to carry an umbrella.
A personal loan borrower may be given a loan, but at a higher interest rate to compensate for his risky profile, though it lies with the “Good” loan range.
These situations call for a regression technique known as Logistic Regression. The proportions of the input to the output do not result in a straight line, but in a curve shaped as an S.
Instead of a technical discussion of why an S curve results, let us see if we can understand it empirically. Consider the chances of a student getting an admission into some of the top colleges in India as a function of their 12th grade final marks. Three distinct possibilities emerge:
- Marks are so high that the admission is almost a certainty, and an unit increase in the marks do not result in a significant increase in the chances of getting an admission.
- Marks are good, but not too high; the chances of getting an admission are highly linked with higher marks. A unit increase in the marks roughly corresponds to an unit increase in the chances of getting an admission.
- Marks are so low that the chances of admission are almost nil, and an unit increase in marks would not significantly brighten the chances of getting an admission.
If you were to plot the above as a scatter with the x-axis being the marks, and the y-axis being the chances of getting a college admission, the resulting curve would be S-shaped.
Logit (or logistic) curves are not the only ones with an S shape. The cumulative distribution function of the standard normal distribution also is a similarly shaped curve called probit. Both logit and probit models yield similar, though not identical, results. Logit models are, overall, a lot more popular.
>> To know more about Business Analytics with R, or Business Analytics please go to Manipal ProLearn.
Soumyadip Pal is a retail analytics professional and a passionate educator with more than 8 years in the industry and more than 7 years in the academia, currently working as a consultant with Manipal Prolearn.