An interactive R Shiny application for exploring hierarchical clustering patterns and associated data shown in heatmaps using various distance and clustering methods through zoomable plots.
This application combines hierarchical clustering dendrograms with interactive heatmaps to provide a comprehensive visualization of high-dimensional data patterns. Users can simultaneously explore both the clustering structure (via dendrogram) and the underlying data values (via heatmap) with synchronized zooming, cluster visualization, and multiple export options. The application is designed to handle any numerical dataset suitable for hierarchical clustering analysis.
The example data provided for application exploration was sourced from this publication: https://doi.org/10.1126/sciadv.abf5733. Lopes, et al. 2021. Systematic dissection of transcriptional regulatory networks by genome-scale and single-cell CRISPR screens. Science Advances.
- Multiple Distance Methods: Choose from Manhattan, Euclidean, Maximum, or Binary distance calculations
- Multiple Clustering Methods: Select from Complete, Single, Average, or Ward.D2 linkage methods
- Dynamic Cut Height: Adjust clustering granularity with real-time visualization
- Color-coded Clusters: Each cluster gets a unique, high-contrast color applied to both dendrogram labels and gene names
- Cluster Intersection Labels: See cluster numbers displayed at cut height intersections
- Side-by-side Layout: Heatmap and dendrogram displayed together with synchronized coordinates
- Coordinated Views: Both plots maintain alignment for easy pattern identification
- Conditional Y-axis Labels: Row numbers shown in full view, hidden when zoomed
- Value-based Coloring: Heatmap uses blue-white-red gradient for intuitive data interpretation
- Numeric Input Zoom: Precise Y-axis zoom control using minimum and maximum row number inputs
- Gene Search & Zoom: Find and zoom to specific genes by name
- Reset Functionality: Easily return to full view with reset
- Cluster Data Export: Download cluster assignments as CSV files with descriptive filenames
- High-Quality Image Export: Export current view (full or zoomed) as publication-ready 300 DPI PNG files
- Dynamic Filenames: All exports automatically include parameters and zoom status for easy organization
- Cut Height Visualization: Red dashed line shows current clustering threshold
- Missing Data Handling: Graceful handling of NA values with heatmap cells shown in grey
- Responsive Design: Clean, professional layout optimized for data exploration
- Real-time Updates: All visualizations update instantly as parameters change
# Required R packages
install.packages(c("shiny", "dplyr", "tibble", "tidyr", "ggplot2", "ggdendro", "patchwork"))-
Clone this repository:
git clone https://github.com/wendysphillips/interactive_dendrogram.git cd interactive_dendrogram -
Use the provided example data OR prepare your data file:
- Replace
"tre_data.tsv"in the script with your actual data file path - Ensure data has items to be clustered as rows and conditions/variables as columns
- First column should contain labels of items to be clustered
- Replace
-
Launch the Shiny app:
# In R console source("shiny_heatmap_dendrogram.r") create_zoom_heatmap()
The application will open in your default web browser.
OR
# In terminal Rscript shiny_heatmap_dendrogram.r
Then paste provided link into web browser.
interactive_dendrogram/
├── shiny_heatmap_dendrogram.r # Main heatmap Shiny application
├── generate_colors.R # Color palette generator for clusters
├── tre_data.tsv # Or replace with your dataset
├── README.md # This documentation
└── LICENSE # MIT License
- Select Methods: Choose your preferred distance calculation and clustering method from the dropdowns
- Adjust Cut Height: Use the numeric input to set clustering granularity and see cluster boundaries
- Explore Patterns: Observe how clustering patterns relate to heatmap data values
- Numeric Zoom: Enter specific row numbers in Y-axis minimum and maximum fields, then click "Apply Y-axis Zoom"
- Gene Search: Enter a gene name and click "Zoom to Gene" for automatic centering and zoom
- Reset View: Click "Reset Zoom" to return to the full dataset view
- Export Clusters: Click "Export Cluster Data" to download CSV with cluster assignments
- Export Images: Click "Export Image (PNG)" to save current view as high-resolution image
- File Organization: All exports include parameter settings and zoom status in filenames
This application works with any numerical dataset suitable for hierarchical clustering:
Data Format Requirements:
- Tab-separated values (.tsv)
- Row names in first column (genes, samples, conditions, etc.)
- Numerical data in subsequent columns
- Data should be pre-processed (scaled, normalized) as appropriate for your analysis
Example Data Structure:
Gene_ID Condition_1 Condition_2 Condition_3 ...
Gene_A 2.5 -1.2 0.8 ...
Gene_B -0.3 3.1 -2.1 ...
Gene_C 1.7 0.4 1.9 ...
... ... ... ... ...
Setup Steps:
- Place your data file in the same directory as the script
- Update the data loading line:
df <- read.table("your_data_filename.tsv", header = TRUE, row.names = 1) - Ensure proper data preprocessing (scaling, normalization) before clustering
- Distance Methods: Manhattan, Euclidean, Maximum, Binary
- Clustering Methods: Complete linkage, Single linkage, Average linkage, Ward.D2 (minimum variance)
- Visualization Framework: ggplot2 with ggdendro for dendrogram extraction and patchwork for plot combination
- Coordinate System: Custom inversion logic to align ggplot2 coordinates with intuitive row numbering
- Reactive Programming: Shiny reactive framework ensures efficient real-time updates
- Export Formats: CSV (cluster data) and PNG (images) with customizable parameters
- Image Quality: 300 DPI, 16"×10" dimensions optimized for publications
- Application scales well with datasets up to several thousand rows
- Large datasets may experience slower rendering during zoom operations
- Memory usage scales with dataset dimensions and number of clusters
Contributions are welcome! Areas for improvement include:
- Additional distance/clustering methods
- Enhanced zoom controls (X-axis zoom)
- Alternative color schemes
- Performance optimizations for very large datasets
- Additional export formats
Please feel free to submit issues, feature requests, or pull requests.
This application was developed jointly by Wendy Phillips and GitHub Copilot through collaborative AI-assisted programming.
This project is licensed under the MIT License - see the LICENSE file for details.