NYU Stern MSBAI Capstone
DriftBreaker
Credit Risk Model Drift Detection Using Survival Analysis
Framework Architecture
BI Analytics
Portfolio metrics, originations, defaults, book composition, cumulative PD curves
Model Engine v24.0
Champion GBM + Challenger Logit, Isotonic calibration, EWMA macro overlay, Forensic Portfolio Engine
Strategy
Financial Confusion Matrix, NPV-based P&L, Loss Estimation (EL=EAD×PD×LGD), Decision Matrix
Project Objectives
- Detect Model Drift: Build a framework to identify when credit risk models deviate from expected performance using survival analysis techniques.
- Attribution Analysis: Decompose drift into macro (economic) and micro (underwriting) components to inform remediation strategies.
- P&L Impact: Translate model drift into financial terms — margin compression, variance attribution, and required rate calculations.
- Actionable Decisions: Provide clear EXIT / REPRICE / MONITOR recommendations by segment based on quantitative analysis.
Lending Club
Pioneer in Peer-to-Peer Lending
Lending Club was founded in 2006 and became the world's largest peer-to-peer lending platform, facilitating over $50 billion in loans before transitioning to a neobank model in 2020. The platform connected borrowers seeking personal loans with investors looking for yield, disrupting traditional banking by removing the intermediary.
Company Timeline
Platform Statistics
Grading System
Lending Club assigned grades A through G to borrowers based on creditworthiness, with subgrades 1-5 within each letter grade. This risk stratification determined interest rates and was central to investor decision-making.
Portfolio Summary
Lending Club data (2007-2018)
By Segment
| Segment | Loans | Exposure | Default Rate (Count) | Default $ |
|---|---|---|---|---|
| High Risk | 189,607 | $3,411.7M | 28.71% | 29.77% |
| Medium Risk | 974,477 | $14,869.6M | 14.68% | 14.65% |
| Low Risk | 1,096,584 | $15,722.9M | 5.92% | 5.60% |
By Term
| Term | Loans | Exposure | Default Rate |
|---|---|---|---|
| 36 months | 1,609,754 | $20,513.0M | 10.70% |
| 60 months | 650,914 | $13,491.2M | 17.15% |
Timing Metrics
Portfolio Composition Evolution
Stacked area chart showing portfolio mix shift over time. Low Risk grew from 40% (2007) to 56% (2018), while High Risk declined from 13% to 5%.
Originations vs Defaults
Annual and cumulative volume analysis
Chart shows ultimate default rates by vintage year. Color coding: Red >18% | Yellow >16% | Green <5% | Blue = Normal
Originations vs. Default Dollars by Year
Dual-axis chart: bars show origination volume (left axis), line shows defaulted dollar amounts (right axis). Note 2015 peak in both originations ($6.4B) and defaults ($1.17B).
Annual Metrics
| Vintage | Loans | Volume | Defaults | Default $ | Default Rate |
|---|---|---|---|---|---|
| 2007 | 251 | $2.2M | 45 | 17.93% | |
| 2008 | 1,562 | $14.4M | 247 | 15.81% | |
| 2009 | 4,716 | $46.4M | 594 | 12.60% | |
| 2010 | 11,536 | $122.1M | 1,487 | 12.89% | |
| 2011 | 21,721 | $261.7M | 3,297 | 15.18% | |
| 2012 | 53,367 | $718.4M | 8,644 | 16.20% | |
| 2013 | 134,814 | $1.98B | 21,030 | 15.60% | |
| 2014 | 235,629 | $3.50B | 41,408 | 17.57% | |
| 2015 | 421,095 | $6.42B | 76,851 | 18.25% | |
| 2016 | 434,407 | $6.40B | 71,666 | 16.50% | |
| 2017 | 443,579 | $6.59B | 44,854 | 10.11% | |
| 2018 | 495,242 | $7.94B | 13,460 | 2.72% |
Default Analysis
Comprehensive default timing, curves, and early warning indicators
Cumulative Default Curves by Vintage
Each line represents a vintage cohort's cumulative default rate over 36 months. 2015-2016 vintages show steep acceleration post-month 12.
Defaulted Dollar Amounts by Vintage
Bar chart shows actual dollar amount of defaults by origination year. 2015 had highest default dollars ($1.17B), driven by both volume and elevated default rate.
Monthly Hazard Rate (Conditional Default Probability)
Hazard rate = P(default in month t | survived to month t). Peak around months 13-16, then declining.
Default Distribution by Time Period
Histogram showing concentration of defaults by month bucket.
Median TTD: High Risk defaults 3 months earlier than Low Risk.
Cumulative Default Timing
| Timeframe | Defaults | % of Total | Cumulative % |
|---|---|---|---|
| 1-3 months | 11,859 | 4.18% | 4.18% |
| 4-6 months | 28,837 | 10.17% | 14.35% |
| 7-9 months | 35,058 | 12.36% | 26.71% |
| 10-12 months | 39,964 | 14.09% | 40.80% |
| 13-18 months (Peak) | 59,131 | 20.85% | 61.65% |
| 19-24 months | 45,914 | 16.19% | 77.84% |
| 25-36 months | 49,145 | 17.33% | 95.17% |
| 37+ months | 11,246 | 3.97% | 99.14% |
Early Defaults (PD12) Analysis
PD12 by Vintage
2016 had worst early default rate (6.68%), 2018 best (2.57%).
Segment × Vintage Heatmap
High Risk 2016: 17.93% early default rate.
Vintage Performance Comparison
| Vintage | Loans | Volume | PD12 | PD24 | PD36 | Ultimate DR |
|---|
Drift Detection
Population Stability Index and performance monitoring
A/E Ratio Over Time
A/E ratio declining from 1.05 (2016Q1) to 0.45 (2018Q4) shows progressive model under-prediction as vintage matures.
| Metric | Threshold | Current | Status |
|---|---|---|---|
| PSI (Overall) | < 0.10 | 0.08 | MINOR SHIFT |
| A/E Ratio | 0.8 - 1.2 | 0.45x | CRITICAL |
| KS Statistic | < 0.05 | 0.04 | STABLE |
| Gini Coefficient | > 0.30 | 0.33 | ACCEPTABLE |
| Brier Score | < 0.10 | 0.072 | GOOD |
Drift Detection Techniques (Dirty Dozen)
Click on any technique to learn more about how it detects distribution shifts.
Status Classification: The system aggregates metrics to classify drift as STABLE, WARNING, or CRITICAL.
Macro/Micro Attribution
Decomposing drift sources into economic vs underwriting factors
Drift Attribution Waterfall
Waterfall shows how base default rate (11.61%) is impacted by macro and micro factors to reach observed rate. Fed-calibrated coefficients: β_unemp=+0.12/σ, β_spread=+0.16/σ.
Attribution Methodology
When model drift is detected, the system decomposes the total drift into two components using log-space decomposition:
macro_multiplier = avg_macro_scalar
micro_multiplier = total_drift / macro_multiplier
log(total) = log(macro) + log(micro)
macro_attribution_pct = (log(macro) / log(total)) × 100
micro_attribution_pct = (log(micro) / log(total)) × 100
Total Drift: Ratio of drifted hazard rate (current period) to base hazard rate (baseline). Captures overall change in default probability.
Macro Attribution: Percentage of drift explained by economic conditions (unemployment, spreads, sentiment, etc.). Reflects external economic stress affecting all borrowers.
Micro Attribution: Percentage of drift explained by underwriting/population shifts. Reflects changes in borrower quality, underwriting standards, or product mix.
Example Interpretation: If macro attribution = 35% and micro attribution = 65%, economic stress accounts for 35% of increased defaults, while 65% is due to borrower quality changes or underwriting deterioration.
Macro Factors (35%)
Economic conditions affecting all borrowers uniformly
- Unemployment rate: +0.5pp
- HY Spread: +50bps
- Consumer sentiment: -5pts
Micro Factors (65%)
Underwriting and population characteristics
- DTI loosening: 38% → 43%
- Grade D/E share: +8%
- Avg income: -$5K
Model Methodology
Forensic Portfolio Engine - Architecture & Key Assumptions (Grade Excluded from Features)
Three-Component Framework
1. Micro Scorecard
- • Champion: GBM (HistGradientBoostingClassifier)
- • Challenger: Logistic Regression
- • 9 features (Grade EXCLUDED)
- • Isotonic + Segment-specific calibration
2. Vintage Curve
- • Empirical default rate by quarter
- • 3-month discrete periods
- • Panel expansion for survival
- • Period categories: Y1-Y5+
3. Macro Overlay
- • 3 indicators: TDSP, UNRATE, CORCACBS
- • EWMA smoothing (α=0.3)
- • 1-quarter lag effect
- • Soft cap: 0.8x - 1.25x scalar
Key Assumptions
1. Multiplicative Independence
Assumes independence between micro, vintage, and macro components. No interaction terms modeled.
2. Quarterly Discretization
Continuous time converted to discrete quarters. Months on book (MOB) mapped to quarters (q = floor(MOB/3)).
3. Fixed 1-Quarter Lag
Economic conditions affect loans issued in the next quarter. Q1 conditions impact Q2 originations.
4. Forensic Date Accuracy
Assumes `last_pymnt_d` accurately reflects true loan duration. Captures early payoffs and extended terms (1-60 months).
5. Default Assignment at Final Month
Defaults assigned only at the final month of loan duration, not throughout the loan lifecycle.
Key Facts
Feature Set (9 Features - Grade EXCLUDED)
- Numeric (6): loan_amnt, dti, annual_inc, revol_util, emp_length_int, pti
- Categorical (3): period_cat, home_ownership, purpose
- Note: Grade is used for risk segmentation only, NOT as a model feature
- Engineered: pti = installment / (monthly_income + 1)
Based on Forensic Portfolio Engine methodology
Macro Indicators (3) + Financial Parameters
- UNRATE: Unemployment Rate (lagged 1Q)
- TDSP: Total Debt Service Payments
- CORCACBS: Consumer Credit (lagged 1Q)
Financial Parameters
- • LGD: 70%
- • Required Return: 6%
- • Fee Servicing: 1%
- • Fee Collection: 18%
- • Recovery Lag: 6 months
Model Training Details
Champion: GBM (HistGradientBoostingClassifier)
- • max_iter = 300
- • learning_rate = 0.05
- • max_depth = 6
- • early_stopping = True
- • scoring = 'neg_log_loss'
- • n_iter_no_change = 20
Challenger: Logistic Regression
- • max_iter = 300
- • solver = 'saga'
- • n_jobs = -1 (parallel)
Calibration
- • Isotonic regression (CalibratedClassifierCV)
- • Segment-specific calibrators (low/med/high risk)
- • Min 1000 samples per segment
Economics & NPV Calculation
Financial Confusion Matrix
Warn threshold at 90th percentile of horizon PD. Only terminal loans (Fully Paid, Charged Off) evaluated - active loans excluded.
Drift Detection & Diagnostics
The system computes multiple statistical metrics to detect distribution shift and model performance:
Core Drift Metrics
- • PSI - Population Stability Index (10 bins)
- • KS Statistic - Kolmogorov-Smirnov distance
- • Mean Shift - Δ in predicted probabilities
- • Variance Ratio - σ²(curr) / σ²(base)
Performance Diagnostics
- • Horizon AUC - Loan-level discrimination
- • Horizon PR-AUC - Precision-Recall
- • A/E Ratio - Actual vs Expected defaults
- • Overprediction Factor - Calibration check
A/E Interpretation: 1.0 = Perfect | >1.2 = Underpredicting | <0.8 = Overpredicting
Leakage Audit
- • ID Disjointness: Verifies no loan IDs overlap between train/cal/test sets
- • Permutation Test: Shuffled labels should yield AUC ≈ 0.5 (no signal from leakage)
P&L Analysis
NPV-based margin analysis with time-aware discounting (LGD: 70%, Required Return: 6%)
Margin Waterfall (High Risk Segment)
Hazard & Margin Analysis by Segment
| Segment | Q Hazard (Base) | Q Hazard (Drifted) | Annual PD | Coupon | Margin | Required |
|---|---|---|---|---|---|---|
| High Risk | 7.14% | 8.57% | 30.1% | 14.0% | -22.1% | 36.1% |
| Medium Risk | 3.27% | 3.69% | 14.0% | 12.0% | -8.0% | 20.0% |
| Low Risk | 1.23% | 1.34% | 5.3% | 9.0% | -2.3% | 11.3% |
Decision Matrix
Strategic recommendations by segment
High Risk
EXIT- 30.38% default rate
- -18.49% margin
- Required: 47.20%
Medium Risk
EXIT- 15.96% default rate
- -5.96% margin
- Required: 26.96%
Low Risk
REPRICE- 6.47% default rate
- -2.47% margin
- Required: 17.47%
DB Intelligence
AI-powered portfolio analysis & strategic recommendations
Scenario Builder
Model macro shocks, underwriting changes, and pricing strategies