The k-means algorithm and its variants are popular techniques of clustering. Their purpose is to uncover group structures in a dataset. In actuarial applications, these methods detect clusters of policies with similar features and allows to draw a map of dominant risks. This working note starts with a review of the k-means algorithm and develops next two extensions to manage categorical features. We develop a mini-batch version that keeps computation time under control when analysing a high-dimensional dataset. We next introduce the fuzzy k-means in which policies can belong to multiple clusters. Finally, we conclude by a detailed introduction to spectral clustering.
Keywords: Clustering analysis, unsupervised learning, k-means, spectral clustering.
Sector: Insurance
Expertise: Machine learning
Authors: Charlotte Jamotton,
Donatien Hainaut, and Thomas Hames
Publication: Risks
Date: September 2024
Language: English
Pages: 28
Donatien Hainaut est Conseiller Scientifique chez Detralytics et Professeur à l’UCLouvain (Belgique), où il dirige le Master en Data Science à orientation statistique. Auparavant, il a occupé plusieurs postes académiques, notamment en tant que Professeur Associé à la Rennes School of Business et à l’ENSAE à Paris. Il possède également une solide expérience en entreprise, ayant travaillé comme Risk Officer, Quantitative Analyst et ALM Officer.
Actuaire qualifié et titulaire d’un doctorat en Asset and Liability Management, ses recherches actuelles portent sur les mécanismes de contagion dans les processus stochastiques ainsi que sur les applications des réseaux de neurones en assurance.
Thomas is part of the Talent Consolidation Program (TCP) at Detralytics. Prior to joining Detralytics, Thomas worked as an intern at AXA in the P&C Retail department and developed a Geo-Spatial analysis based on Machine Learning models.
Charlotte is a PhD Student in Actuarial Science at UCLouvain.