DESCRIPTION
Do you feel lost in the random forests? Do you need some career boosting? Would you like to demystify magic words like cross-validation, bagging, shrinkage… or discover what is hidden behind wild acronyms like GAM, LASSO, GBM, etc. that you heard during meetings or at the coffee machine, without daring to ask for? Do you wonder whether GLMs should still be considered by actuaries, or better archived in a museum dedicated to the history of the actuarial discipline? If affirmative then you should consider attending this intensive course about statistical learning techniques applied to insurance data analysis!!
This course has been conceived by actuaries for actuaries, accounting for all the specificities of insurance data instead of simply re-using standard recipes borrowed from other fields. The sessions proceed step by step, recalling the fundamental statistical concepts at the heart of the modern learning techniques and the standard GLM approach, and then moving to GAMs, GBMs and tree-based methods (Module 2) like random forests. Their relative merits are illustrated by means of several case studies with insurance data.
The sessions aim to be interactive, alternating between methodological parts and case studies performed in front of the audience. Participants are invited to bring their own PC. Documentation including data sets and R code is made available through a supporting website. The installation of R packages prior to attendance is required.
Participants receive free copies of the reference manuals (co-authored by the trainers):
PROGRAM
1. INTRODUCTION TO INSURANCE ANALYTICS AND BASIC MODELS
- Fundamental statistical principles underlying the modern learning approaches (training vs validation set, prediction error, cross validation, bootstrap, etc.)
- Insurance data specificities (claim numbers with excess of zeros, claim severities mixing attritional and large claims, observational data, selection bias, correlation vs causality, censoring)
- Recap’ of the current GLM practice, with application to claim reserving, graduation of rates, risk classification
2. GENERALIZED LINEAR AND NONLINEAR REGRESSION MODELS
- Limitations of GLM tools and the need for other techniques
- Regularization/shrinkage for GLMs: Lasso, Ridge and related penalties
- First extensions: GAMs, double GLMs and GAMLSS with application to claim reserving, graduation of rates, risk classification
- Second extension: GAMboost and GBMs
ACQUIRED SKILLS
After completion of the training session, participants will have acquired a general knowledge of insurance analytics. They will be able to select the appropriate approach for their own data, run the R code and interpret the results.
One month after the end of the training, a follow-up discussion is organized to share experience in implementing the approach that have been presented.