K-means Clustering Tutorial

1. When do we use K-means Clustering?

The K-means clustering algorithm is an algorithm that aggregates a given data into k clusters, operating in a way that minimizes the variance of the distance difference between each cluster. This algorithm is a type of self-learning, which serves to label unlabeled input data. K mean clustering can be analyzed using similarity by measuring the distance between the given observations when there is no prior information about the population or category. It also groups the entire data into groups, which understand the nature of each group and provide an understanding of the overall structure of the data. K-means clustering is a method of taking k as input and making a set of objects into k clusters. This means that there should be high similarity within a cluster and low similarity between clusters for good classification.

2. Find the “Statistics” section under the black banner at the top and click “K-means Clustering”.

3. Choose either your own file or sample to run the K-means Clustering model. Let’s use the San Francisco Airport Satisfaction Data in 2018 (SFO 2018).

4. Click the “Select” button after finishing deciding.

5. Choose the number of clustering.

6. Select two or more variables. These variables should be continuous.

7. Click the “Run” button.

8. Check the result. You can check the Average Attribute Ratings Plots in 3 different ways, “Means”, “Standardized”, and “Vertical”.

9. If you click the “ Table” button, which is next to the plot, you can easily check the table for the attributes.