Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running a filter hits all tables, instead of just one #56

Open
teddyward opened this issue Dec 1, 2018 · 2 comments
Open

Running a filter hits all tables, instead of just one #56

teddyward opened this issue Dec 1, 2018 · 2 comments

Comments

@teddyward
Copy link

teddyward commented Dec 1, 2018

Hi! Thank you for making this library. I am really happy that someone has done this. I am having trouble getting it to work as-expected, though (maybe related to #34 ?)

I have annotated a model as follows. This works rather nicely to shuttle incoming data into different tables:

@architect.install('partition', type='range', subtype='string_firstchars',
                   constraint='2', column='state')
class Parcel(models.Model):

I then query it as follows:

qs = Parcel.objects.filter(state='15', boundary__intersects=polygon)

I would expect this query to only try to hit the table distribution_parcel_15, but when I didn't see any performance improvement post-adding-partitions, I ran explain() on the above, and saw that every child table was being hit.

        ->  Index Scan using distribution_parcel_34_boundary_idx on distribution_parcel_34  (cost=0.41..8.68 rows=1 width=1089)
              Index Cond: (boundary && '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry)
              Filter: (((state)::text = '15'::text) AND _st_intersects(boundary, '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry))
        ->  Index Scan using distribution_parcel_12_boundary_idx on distribution_parcel_12  (cost=0.41..8.68 rows=1 width=1805)
              Index Cond: (boundary && '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry)
              Filter: (((state)::text = '15'::text) AND _st_intersects(boundary, '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry))
...

and EXPLAIN SELECT COUNT(*) FROM distribution_parcel WHERE state='15'; gives:

                     ->  Parallel Index Only Scan using state_34 on distribution_parcel_34  (cost=0.43..4.44 rows=1 width=0)
                           Index Cond: (state = '15'::text)
                     ->  Parallel Index Only Scan using state_12 on distribution_parcel_12  (cost=0.43..4.44 rows=1 width=0)
                           Index Cond: (state = '15'::text)
...

Am I fundamentally misunderstanding partitions, or what?

@emord
Copy link
Contributor

emord commented Dec 2, 2018

@teddyward I ran into a similar issue recently and updated the docs around this here: https://architect.readthedocs.io/features/partition/postgresql.html#performance

#55 should provide some more information, but basically a simple WHERE state = 'ab' won't be able to use the check constraint because the check constraints are constructed using a postgres function to get a substring. The short term fix for this is to replicate that function in your query.f

(note I'm not the project's maintainer) For the long term, I'd like to add a feature to architect that would allow for unique partitioning that would create simple check constraints such as state = 'ab', but have not had a chance and don't foresee doing that in the near future.

@teddyward
Copy link
Author

teddyward commented Dec 2, 2018

Huh, thank you!!! This tip probably just reduced the runtime of my full script by about 29 days (99% or so). I was able to achieve that through Django by adding the following to my filter clause:

.extra(where=["lower(substr(state, 1, 2)) = '15'"])

I couldn't find a cleaner way to do it, as the __icontains and __istartswith clauses produce slightly different queries. Edit: the substr class will probably do this more cleanly. But the above clause gives the exact explain that I wanted and a MUCH faster query!

        ->  Index Scan using distribution_parcel_15_boundary_idx on distribution_parcel_15  (cost=0.28..8.56 rows=1 width=1567)
              Index Cond: (boundary && '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry)
              Filter: ((lower(substr((state)::text, 1, 2)) = '15'::text) AND _st_intersects(boundary, '0106000020E61000000100000001030000000100000005000000969EB6EA0DB763C060D33019BE503540F9408D560EB763C021FADE3CC45035402FD0AC070EB763C0E82FABB7FA503540406BAFC50DB763C0E625D2CAF8503540969EB6EA0DB763C060D33019BE503540'::geometry))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants