Abstract

The k-means algorithm and its variants are popular techniques of clustering. Their purpose is to uncover group structures in a dataset. In actuarial applications, these methods detect clusters of policies with similar features and allows to draw a map of dominant risks. This working note starts with a review of the k-means algorithm and develops next two extensions to manage categorical features. We develop a mini-batch version that keeps computation time under control when analysing a high-dimensional dataset. We next introduce the fuzzy k-means in which policies can belong to multiple clusters. Finally, we conclude by a detailed introduction to spectral clustering.

Keywords: Clustering analysis, unsupervised learning, k-means, spectral clustering.

Sector: Insurance

Expertise: Machine learning

Authors: Charlotte Jamotton,

Donatien Hainaut, and Thomas Hames

 

 

Publication: Risks

Date: September 2024

Language: English

Pages: 28

 

About the authors

Donatien Hainaut

Donatien Hainaut

Thomas Hames

Charlotte Jamotton

Charlotte is a PhD Student in Actuarial Science at UCLouvain.

LAST call

Don't miss our upcoming Lunch & Learn