Gaurav Tank

Aug 9, 2021

5 min read

K-means Clustering and Its use-case in the Security Domain

What is Unsupervised Learning ?

♦ What is K-Means Clustering?

K-means Algorithm

  1. Specify the number of clusters K.
  2. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
  3. Keep iterating until there is no change to the centroids. i.e the assignment of data points to clusters isn’t changing.
  • Compute the sum of the squared distance between data points and all centroids.
  • Assign each data point to the closest cluster (centroid).
  • Compute the centroids for the clusters by taking the average of all data points that belong to each cluster.

Applications of K-means Clustering

  • Customer Profiling
  • Market segmentation
  • Computer vision
  • Geo-statistics
  • Astronomy
  • Document clustering
  • Identifying crime-prone areas
  • Cluster analysis
  • Feature learning or dictionary learning
  • Identifying crime-prone areas
  • Insurance fraud detection
  • Public transport data analysis

Cyber profiling using K-Means

1. Document analysis

2. Spam filter

3. Identifying fraudulent or criminal activity