`kl` and `rubin` tend to cluster around 20-40 across most datasets when 50 is the upper limit. Other metrics tend towards 2-7 or 48-50. However, not clear if `kl` and `rubin` retain their performance when >> 50 clusters are tested.