Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering #19

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Clustering #19

merged 2 commits into from
Jun 3, 2024

Conversation

gwirn
Copy link
Collaborator

@gwirn gwirn commented Jun 3, 2024

This adds a new class in data to prepare trajectories for training.

It first loads one or multiple trajectories and removes all atoms that are not part of a protein. In the case of multiple trajectories it joins them to one trajectory.

After that it can be used to subsample the trajectories with 3 different methods:

  • Stride: calculates the step size to take n_cluster frames with a given step size spacing between them, if it is not possible to calculate the exact number it remove as many as needed to get n_cluster
  • distance_cluster: clusters all frames based on their RMSD to all other frames, then for each cluster it calculates the representative frame - the frame with the highest similarity to all other frames in the cluster. The current clustering method used is agglomerative clustering which can be easily changed to any distance matrix accepting clustering method
  • pca_cluster: creates a principal component analysis over all frames of the trajectory and uses the first n components as input for KMeans clustering and proceeds as distance_cluster to find the representative frames

after that it saves a new topology file for the trajectory, the new trajectory as dcd file and a txt file where the indices of the frames of the original trajectory are saved

@gwirn gwirn requested a review from degiacom June 3, 2024 13:31
Copy link
Member

@degiacom degiacom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful update, all looks good to me

@degiacom degiacom merged commit 79abb57 into master Jun 3, 2024
2 checks passed
@degiacom degiacom deleted the clustering branch June 3, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants