The goal of this question is to tune the hyper parameters of techniques to get exact cluster numbers, and compare your results with the clusters provided.
Prereqs: You will be given a dataset with cluster IDs attached.
Using the dataset provided, determine the number of classes (clusters). Using k-mean technique, tune the hyper parameters until you can cluster the new dataset having the exact cluster number in the provided dataset.
Load the new dataset and cluster. Make a scatter plot of new dataset by the true class. Use the first column on the x-axis and the second column on the y-axis. Calculate the centroids in clusters. Eliminate the third column (the class column). Cluster the new dataset using K-mean and tune the hyperparameters until you can obtain the exact cluster numbers or number of classes. Report the new hyper parameters. Make the same plot by new cluster id obtained by the tuned hyper parameters. Calculate and tabulate the new centroids in clusters. Comment of the similarities and differences between two plots.