Abstract
The k-means algorithm and its variants are popular techniques of clustering. Their purpose is to uncover group structures in a dataset. In actuarial applications, these methods detect clusters of policies with similar features and allows to draw a map of dominant risks. This working note starts with a review of the k-means algorithm and develops next two extensions to manage categorical features. We develop a mini-batch version that keeps computation time under control when analysing a high-dimensional dataset. We next introduce the fuzzy k-means in which policies can belong to multiple clusters. Finally, we conclude by a detailed introduction to spectral clustering.
Keywords: Clustering analysis, unsupervised learning, k-means, spectral clustering.
Sector: Insurance
Expertise: Machine learning
Authors: Charlotte Jamotton,
Donatien Hainaut, and Thomas Hames
Publication: Risks
Date: September 2024
Language: English
Pages: 28
About the authors
Donatien Hainaut
Donatien Hainaut is a Scientific Advisor at Detralytics and a professor at UCLouvain (Belgium), where he serves as the Director of the Master’s program in Data Science with a statistical orientation. Prior to this, he held several academic positions, including Associate Professor at Rennes School of Business and ENSAE in Paris. He also has extensive industry experience, having worked as a Risk Officer, Quantitative Analyst, and ALM Officer.
Donatien is a Qualified Actuary and holds a PhD in the field of Asset and Liability Management. His current research focuses on contagion mechanisms in stochastic processes and the applications of neural networks in insurance.
Thomas Hames
Thomas is part of the Talent Consolidation Program (TCP) at Detralytics. Prior to joining Detralytics, Thomas worked as an intern at AXA in the P&C Retail department and developed a Geo-Spatial analysis based on Machine Learning models.