Insurance Analytics with Clustering Techniques

Detralytics
September 5, 2024

Abstract

The k-means algorithm and its variants are popular techniques of clustering. Their purpose is to uncover group structures in a dataset. In actuarial applications, these methods detect clusters of policies with similar features and allows to draw a map of dominant risks. This working note starts with a review of the k-means algorithm and develops next two extensions to manage categorical features. We develop a mini-batch version that keeps computation time under control when analysing a high-dimensional dataset. We next introduce the fuzzy k-means in which policies can belong to multiple clusters. Finally, we conclude by a detailed introduction to spectral clustering.

Keywords: Clustering analysis, unsupervised learning, k-means, spectral clustering.

Sector: Insurance

Expertise: Machine learning

Authors: Charlotte Jamotton,

Donatien Hainaut, and Thomas Hames

Publication: Risks

Date: September 2024

Language: English

Pages: 28

About the authors

Donatien Hainaut

Donatien Hainaut is a Scientific Advisor at Detralytics and a professor at UCLouvain (Belgium), where he serves as the Director of the Master’s program in Data Science with a statistical orientation. Prior to this, he held several academic positions, including Associate Professor at Rennes School of Business and ENSAE in Paris. He also has extensive industry experience, having worked as a Risk Officer, Quantitative Analyst, and ALM Officer.

Donatien is a Qualified Actuary and holds a PhD in the field of Asset and Liability Management. His current research focuses on contagion mechanisms in stochastic processes and the applications of neural networks in insurance.

Thomas Hames

Thomas is part of the Talent Consolidation Program (TCP) at Detralytics. Prior to joining Detralytics, Thomas worked as an intern at AXA in the P&C Retail department and developed a Geo-Spatial analysis based on Machine Learning models.

Charlotte Jamotton

Charlotte is a PhD Student in Actuarial Science at UCLouvain.

Share This Post

More To Explore

2025

Insurance risk classification with generalized gaussian process regression models

This paper proposes a new approach to risk classification based on Generalized Gaussian Pro-cess Regression (GGPR).

Detralytics 01/04/2025

2025

Boosting on the responses with tweedie and binomial loss functions

L’émergence du boosting dans le domaine du machine learning a rapidement gagné en popularité parmi les actuaires. Les distributions de Tweedie (dont la Poisson, la Gamma) et binomiale sont les plus couramment utilisées en assurance par exemple pour les modèles de tarification.

Detralytics 13/03/2025

Detra Note Cloud Failure and Cyber Insurance

2024

Cloud failure and cyber insurance: calibration of stress scenarios and diversification

The expansion of the cyber insurance market is constantly under the threat of an accumulation event that would simultaneously affect a large number of policyholders…

Detralytics 20/08/2024