Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use .distinct instead of .group(:id) in new TN relationship filters #…
…3094 I initially used .group(:id) because I noticed one of the queries seemed really slow. I now think it was just the "Subject or object of a relationship" case of the "Relationship subject/object" facet in Filter TNs - tested here on the UCD project, on my personal computer. The result set is 58,337 records. If I benchmark taxon_names/index.json.jbuilder with: ```ruby time = Benchmark.measure do json.array!(@taxon_names) do |taxon_name| json.partial! '/taxon_names/attributes', taxon_name: taxon_name end end puts time.real ``` and run the filter specified above I get: ~.15s with .group(:id) ~2.5s with .distinct :( with the code prior to this commit. The scope being used is ```ruby scope :with_taxon_name_relationships, -> { joins('LEFT OUTER JOIN taxon_name_relationships tnr1 ON taxon_names.id = tnr1.subject_taxon_name_id'). joins('LEFT OUTER JOIN taxon_name_relationships tnr2 ON taxon_names.id = tnr2.object_taxon_name_id'). where('tnr1.subject_taxon_name_id IS NOT NULL OR tnr2.object_taxon_name_id IS NOT NULL') --> either .distinct or .group(:id) here } ``` Using ``` referenced_klass_union([ ::TaxonName.with_taxon_name_relationships_as_subject, ::TaxonName.with_taxon_name_relationships_as_object ]) ``` instead gives ~0.9s Using ``` ::TaxonName.joins('join taxon_name_relationships ON ' \ 'taxon_names.id = taxon_name_relationships.subject_taxon_name_id OR ' \ 'taxon_names.id = taxon_name_relationships.object_taxon_name_id' ).distinct ``` instead gives ~0.7s (using `.group(:id)` brings it down to ~0.15-.02). Leaving things there - using distinct - for now. (Preferring distinct for semantics and to match(?) the rest of the codebase.) [ ```ruby ::TaxonName.joins('join taxon_name_relationships ON ' \ 'taxon_names.id = taxon_name_relationships.subject_taxon_name_id OR ' \ 'taxon_names.id = taxon_name_relationships.object_taxon_name_id' ).select('DISTINCT ON (taxon_names.id) taxon_names.*') ``` gets you back down into the ~.15s range, but DISTINCT ON () is pg-specific. I don't really understand why .distinct isn't being optimized to 'distinct on id' in this case since you're running distinct across rows of TaxonName, which are unique on id...]
- Loading branch information