Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityDomain.add_entity() slow #119

Open
miroli opened this issue Nov 4, 2019 · 5 comments
Open

EntityDomain.add_entity() slow #119

miroli opened this issue Nov 4, 2019 · 5 comments

Comments

@miroli
Copy link

miroli commented Nov 4, 2019

We've run into some performance issues when running ddf_utils.package.create_datapackage(). We have some files with hundreds of thousands of entities and running this function takes a very long time in those cases.

After some profiling it turns out that the culprit is EntityDomain.add_entity() in ddf_utils.model.ddf which as I understand it loops through all rows in entity files and runs some identity checks. Would it be possible to vectorize that loop?

@semio
Copy link
Owner

semio commented Nov 5, 2019

Yes, you are right, calling add_entity for a lot of entities is expensive. I think it's possible to avoid calling add_entity one by one, I will improve the codes soon

semio pushed a commit that referenced this issue Nov 11, 2019
- add validator for EntityDomain initialization
- and avoid add_entity()
semio pushed a commit that referenced this issue Nov 11, 2019
@semio
Copy link
Owner

semio commented Nov 11, 2019

@miroli I updated the process for loading entity domains, and I tested the create_datapackage function against a dataset with 1,000,000 entities and it can create the datapackage in 12 minutes.

Could you test the master branch against your dataset? If it's not convenient for you to install from source I will make a release for you.

@miroli
Copy link
Author

miroli commented Nov 11, 2019

That's great news! If you could make a release, that would be even greater as installing from source is tricky with our current setup.

@semio
Copy link
Owner

semio commented Nov 11, 2019

ok, v1.0.6 is ready, please have a try

@miroli
Copy link
Author

miroli commented Nov 29, 2019

It's much better now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants