Tuesday, April 26, 2016

Partitioning using SA Clustering

Well, it needs a lot of tuning, but my algorithm for Data Mining basically works. As this is a homework assignment and not something for actual publication, I'll probably leave it at that. Here's the plot on a uniform random set of points:

The rectangles are the boundaries of the partitions after two iterations. The annealing temperature starts at .99 and works down to .25.



OK, that's not exactly thrilling. But, there weren't any real clusters so what do you want? Let's try it with a clustered data set:


Sorry, it's hard to read. I'll need to fix that before I turn it in. Basically, it's found all three clusters in just two iterations though, if you look close, you can see that the bottom cluster is split into two partitions. Letting it cool to an annealing temperature of .25 (which is absolutely arbitrary, it just seemed to work for the test data I was using) fixes things up a bit.


Still not perfect, but there's a partition on each cluster and the boundaries aren't off by much. Furthermore, the code is pretty slick in R; every operation is vectorized. I'll publish that tomorrow after I turn it in.

No comments:

Post a Comment