Wednesday, September 23, 2020

Does job title affect lendingclub default rate?

 It was an unpleasant experience whenever a lendingclub loan defaulted in my portfolio.  As a habit of curiosity, I would clicked into the loan and look for clues that I and my model might have missed.  One possibly biased observation of mine is that I saw many "nurse" and "teacher" defaults.   For sometime, my gut feeling is to lower my model's tendency to pick some specific job title;  I resisted that temptation because I have no solid evidence to support that.  lendingclub's loan applicants have tens of thousands of different job titles, some have typos, some abbreviations; it is difficult to treat user typed job title as independent factor to model input -- that would be creating too many factors and new job title cannot be classified to existing job titles.  

Until recently, I am starting to playing with NLP.  A simple idea floats:

  • vectorize job title
  • cluster vectors

I proceed to use google trained word2vec model  to vectorize lendingclub historical data borrowers' job title.  I then use clustering algorithm to cluster them (e.g. KMeans).

Here are some interesting findings:

My simple clustering exercise works reasonably well; it does cluster similar job titles together.  The following job titles are in one cluster:

Research
Speech Language Pathologist
Chemistry Lead
Senior Database Researcher
Associate Professor
Researcher
Radiologic technologist
Professor
Statistician
Scientist
Geologist
Clearly, this cluster groups scientific researchers together.

Here is another cluster:

Analyst
Liability Analyst
Columnist
Senior Strategist
Inventory Analyst
Project Analyst
Lead Analyst
Business Analyst
This group is more affected by keyword "Analyst".

I can then calculate the default rates of each cluster:

0.0     0.161513
1.0     0.158743
2.0     0.178444
3.0     0.224322
4.0     0.197112
5.0     0.217364
6.0     0.276460
7.0     0.271106
8.0     0.221385
9.0     0.233080
10.0    0.206700
11.0    0.251355
12.0    0.186574
 

Some jobs do have higher default rate than others.  The question is, does job title brings more information than I already have?  Clearly, different jobs indicates different income level.  Will job title have prediction power after I consider income and other financial health information?  That remains to be found out...

 

No comments:

Post a Comment