-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Members of a K-means clsuter #3
Comments
Hi, This feature is available but the documentation does not currently discuss it. This information is returned as a NumPy array by the
This array (referred to as |
Thank you for your fast response! Thats very heplful and does solve my original query.I am making good progress on applying this to my own data. Now I am wondering what the best way to display this on the heatmap would be? |
Great, I'm glad that helped. I updated the example notebook to show how the K-mean cluster ids can be overlayed on the original data by adding an additional row category (see below). Let us know if that answered your question and if you have any other questions. |
My dataset consists of 1000 bacterial strains and data relating to their ~3000 genes. My primary motivation for downsampling is to simplify the heatmap to a manageable size. The solution you have proposed does not solve this specific motivation of mine as it is the same size as the original dataset. I can see useful tips in your update on modifying labels and adding columns.I am sure I will be able to incorporate these at a later date. My Ideal solution would almost be the reverse of the solution you proposed. the K-means clustered heatmap with details as to which gene is represented by which K-means.I can see many problems with what I am proposing. I am trying it out myself. This may also go against my aims of simplfying the data. What do you think? I have been able to plot heatmaps of individual K-means clusters but this is not nearly as elegant as is possible, im sure! It seems as though having gene names in one column beside the K-means would be a messy way(and probably impossible) to show such information. |
I see, it sounds like your matrix is ~1,000 columns/strains by ~3,000 rows/genes and you are looking to reduce the size of your dataset to something more manageable. It will probably be difficult to show the gene list of the downsampled clusters (and this is not currently supported by Clustergrammer). I would recommend a couple of things based on our experience with similar datasets. We used Clustergrammer to visualize the Cancer Cell Line Encyclopedia which is ~1,000 columns/cancer-cell-lines by ~20,000 rows/genes (see CCLE Notebook). We first filtered for the top 1,000 most variable genes and then downsampled our cell-lines to obtain 100 cell line clusters (downsampling also keeps track of the most common category in each cluster). So if you can filter your genes down (based on variance or sum) then something like this might be useful. The MNIST notebook also does something similar. If you can add some category to your genes, then this would be tracked with the downsampling, but it is not exactly what you are asking for. Finally, the next version of Clustergrammer will be built in WebGL, which can handle much larger datasets like these. Here's a very simple visualization of a random matrix of 1,000 rows by 1,000 columns built in WebGL to demonstrate how much data can be handled. We can keep you up to date on this progress. |
The link to the notebook you made is now dead unfortunately! |
Which notebook? Can you provide the link because the links I checked on this thread still appear to work. |
You are right! I should have checked today. Yesterday I am certain they were down. |
No problem, did the approaches we recommended work out? |
Hi,
I am struggling to find the right documentation that details which rows or columns have been clustered into a specific K-means cluster.
Is this feature available? Or how would you suggest is the best way to go about doing this?
I have been looking in the notebooks of the examples but cannot find it. The cytof notebook does detail a similar process, but it is more complicated.
The text was updated successfully, but these errors were encountered: