Skip to content
This repository has been archived by the owner on Sep 24, 2019. It is now read-only.

Current memory constraints. #1

Open
jhourani opened this issue Aug 28, 2013 · 0 comments
Open

Current memory constraints. #1

jhourani opened this issue Aug 28, 2013 · 0 comments

Comments

@jhourani
Copy link
Contributor

Currently, gp_baseline.py consumes a lot of memory. It runs on an 8GB machine, provided the PubChem dataset is not included. With PubChem, the memory usage can get above 30GB! Replacing the underlying dictionaries in parsed.py with a database (Python's DBM module) has shown to vastly improve memory usage over a subset of the data. The design would have to be modified slightly, so that the database is not being opened/closed with every iteration of the parser (potentially tens of millions).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant