Pre-computed files need to be regenerated for each set of parameters #16

shervinea · 2021-09-10T04:23:13Z

Context. Real-time PDB parsing with the BioPython package, e.g. typically:

enzynet/enzynet/pdb.py

Line 53 in 31d30e0

self.structure = PDBParser().get_structure(pdb_id.upper(), fullfilename)

is expensive and bottlenecks the training process if done on the fly.

For this reason, we put in place a "precomputation stage"

enzynet/enzynet/volume.py

Line 123 in 31d30e0

def check_precomputed(self) -> None:

that takes all enzymes beforehand and stores target volumes in a dedicated folder.

Current limitation. This process is repeated for each set of parameters {weights considered, interpolation level between atoms p, volume size}. This is ineffective from the perspectives of:

total computations performed: PDB parsing is the same for all these configurations and needs to be identically repeated for each of them. The only remaining operations are relatively cheap: e.g. 2D -> 3D mapping, points interpolation. With a proper implementation, these last steps can easily be done on the fly without becoming a bottleneck.
space: the number/size of produced files increases with the same pace as the number of configurations that the user tries out (!).

Desired behavior. Coordinates + weights precomputation from PDB files is done only once and produces a parsed version of the data that is:

Light enough so that it can be transformed to target volumes on the fly
Complete enough so that all configurations' data can be derived from them.

shervinea added the Feature request label Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-computed files need to be regenerated for each set of parameters #16

Pre-computed files need to be regenerated for each set of parameters #16

shervinea commented Sep 10, 2021

Pre-computed files need to be regenerated for each set of parameters #16

Pre-computed files need to be regenerated for each set of parameters #16

Comments

shervinea commented Sep 10, 2021