Skip to content

Commit

Permalink
Use .distinct instead of .group(:id) in new TN relationship filters #…
Browse files Browse the repository at this point in the history
…3094

I initially used .group(:id) because I noticed one of the queries seemed really slow. I now think it was just the "Subject or object of a relationship" case of the "Relationship subject/object" facet in Filter TNs - tested here on the UCD project, on my personal computer. The result set is 58,337 records.

If I benchmark taxon_names/index.json.jbuilder  with:
```ruby
time = Benchmark.measure do
  json.array!(@taxon_names) do |taxon_name|
    json.partial! '/taxon_names/attributes', taxon_name: taxon_name
  end
end
puts time.real
```
and run the filter specified above I get:
~.15s with .group(:id)
~2.5s with .distinct :(
with the code prior to this commit.

The scope being used is
```ruby
scope :with_taxon_name_relationships, -> {
    joins('LEFT OUTER JOIN taxon_name_relationships tnr1 ON taxon_names.id = tnr1.subject_taxon_name_id').
    joins('LEFT OUTER JOIN taxon_name_relationships tnr2 ON taxon_names.id = tnr2.object_taxon_name_id').
    where('tnr1.subject_taxon_name_id IS NOT NULL OR tnr2.object_taxon_name_id IS NOT NULL') --> either .distinct or .group(:id) here
  }
```

Using
```
referenced_klass_union([
            ::TaxonName.with_taxon_name_relationships_as_subject,
            ::TaxonName.with_taxon_name_relationships_as_object
          ])
```
instead gives ~0.9s

Using
```
::TaxonName.joins('join taxon_name_relationships ON ' \
            'taxon_names.id = taxon_name_relationships.subject_taxon_name_id OR ' \
            'taxon_names.id = taxon_name_relationships.object_taxon_name_id'
          ).distinct
```
instead gives ~0.7s (using `.group(:id)` brings it down to ~0.15-.02). Leaving things there - using distinct - for now. (Preferring distinct for semantics and to match(?) the rest of the codebase.)

[
```ruby
::TaxonName.joins('join taxon_name_relationships ON ' \
            'taxon_names.id = taxon_name_relationships.subject_taxon_name_id OR ' \
            'taxon_names.id = taxon_name_relationships.object_taxon_name_id'
          ).select('DISTINCT ON (taxon_names.id) taxon_names.*')
```
gets you back down into the ~.15s range, but DISTINCT ON () is pg-specific. I don't really understand why .distinct isn't being optimized to 'distinct on id' in this case since you're running distinct across rows of TaxonName, which are unique on id...]
  • Loading branch information
kleintom committed Feb 12, 2025
1 parent 9c91b1e commit 19c68e1
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions lib/queries/taxon_name/filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -588,13 +588,13 @@ def taxon_name_relationship_type_facet
if taxon_name_relationship_type_subject.present?
s = ::TaxonName.as_subject_with_taxon_name_relationship(
taxon_name_relationship_type_subject
).group(:id)
).distinct
end

if taxon_name_relationship_type_object.present?
o = ::TaxonName.as_object_with_taxon_name_relationship(
taxon_name_relationship_type_object
).group(:id)
).distinct
end

if taxon_name_relationship_type_either.present?
Expand Down Expand Up @@ -635,11 +635,16 @@ def relation_to_relationship_facet
return nil if relation_to_relationship.nil?

if relation_to_relationship == 'subject'
::TaxonName.with_taxon_name_relationships_as_subject.group(:id)
::TaxonName.with_taxon_name_relationships_as_subject.distinct
elsif relation_to_relationship == 'object'
::TaxonName.with_taxon_name_relationships_as_object.group(:id)
::TaxonName.with_taxon_name_relationships_as_object.distinct
else
::TaxonName.with_taxon_name_relationships.group(:id)
# 3-4x more time-performant than using
# :with_taxon_name_relationships.distinct
::TaxonName.joins('join taxon_name_relationships ON ' \
'taxon_names.id = taxon_name_relationships.subject_taxon_name_id OR ' \
'taxon_names.id = taxon_name_relationships.object_taxon_name_id'
).distinct
end
end

Expand Down

0 comments on commit 19c68e1

Please sign in to comment.