Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[back] perf: optimize the perf of /recommendations/ API #1881

Open
GresilleSiffle opened this issue Jan 15, 2024 · 2 comments
Open

[back] perf: optimize the perf of /recommendations/ API #1881

GresilleSiffle opened this issue Jan 15, 2024 · 2 comments
Labels
Backend Back-end code of Tournesol
Projects

Comments

@GresilleSiffle
Copy link
Collaborator

GresilleSiffle commented Jan 15, 2024

See the comment #1876 (comment)

Context

The API recommendations can take more than 500ms to process a request.

The following SQL join on the table tournesol_entitycriteriascore seems to be the most expensive part of the query:

LEFT OUTER JOIN "tournesol_entitycriteriascore" ON (
  "tournesol_entity"."id" = "tournesol_entitycriteriascore"."entity_id"
)

Even if the criteria scores are prefetched in another query, this join allows us to compute the total_score based on potential weighted criteria present in the request query parameters.

Proposition

Determine

  1. if it's relevant to optimize this logic
  2. if there is a satisfactory way to optimize the SQL query or the Python code
  3. check any potential side effect on the total score and the results ordering
@GresilleSiffle GresilleSiffle added the Backend Back-end code of Tournesol label Jan 15, 2024
@GresilleSiffle GresilleSiffle self-assigned this Jan 15, 2024
@GresilleSiffle GresilleSiffle added this to Ideas / Backlog in kanban via automation Jan 15, 2024
@GresilleSiffle GresilleSiffle moved this from Ideas / Backlog to In progress in kanban Jan 15, 2024
@GresilleSiffle GresilleSiffle removed their assignment Jan 16, 2024
@GresilleSiffle GresilleSiffle moved this from In progress to Ready in kanban Jan 16, 2024
@GresilleSiffle
Copy link
Collaborator Author

GresilleSiffle commented Jan 16, 2024

One optimization could be to remove from the SQL join the criteria that are not required to compute the total score.

For instance, when a request contains criteria with a weight of 0, we can filter them in the SQL. This will exclude a lot of rows from the table tournesol_entitycriteriascore, and will greatly speed up the SQL query.

(edit careful, removing criteria scores from the response will make the "rated high" and "rated low" hint in the front end incorrect.)

The requests made by the front end will benefit from this change. By removing useless rows, the execution time of the SQL query used by the following request is reduced to ~ 60 ms from more than ~ 130 ms.

capture

@GresilleSiffle
Copy link
Collaborator Author

GresilleSiffle commented Jan 17, 2024

One optimization could be to remove from the SQL join the criteria that are not required to compute the total score.

This improves the performance only when fetching safe recommendations. When fetching unsafe
recommendations the query is slower (from ~ 260ms to ~ 320 ms).

safe

The following query excluding criteria with a weight of zero (here only importance has a weight):

FROM 
  "tournesol_entity" 
  LEFT OUTER JOIN "tournesol_entitycriteriascore" ON (
    "tournesol_entity"."id" = "tournesol_entitycriteriascore"."entity_id"
  ) 
  INNER JOIN "tournesol_entitypollrating" ON (
    "tournesol_entity"."id" = "tournesol_entitypollrating"."entity_id"
  ) 
WHERE 
  (
    (
      "tournesol_entity"."metadata" -> 'publication_date'
    ) <= '"2023-12-17T08:00:00+00:00"' 
    AND "tournesol_entitycriteriascore"."poll_id" = 1 
    AND "tournesol_entitycriteriascore"."score_mode" = 'default' 
    -- keep only criteria with non zero weight
    AND "tournesol_entitycriteriascore"."criteria" IN ('importance') 
    --
    AND "tournesol_entitypollrating"."poll_id" = 1 
    AND "tournesol_entitypollrating"."sum_trust_scores" > 1.5 
    AND "tournesol_entitypollrating"."tournesol_score" > 20
  )

... is more efficient than this one:

FROM 
  "tournesol_entity" 
  LEFT OUTER JOIN "tournesol_entitycriteriascore" ON (
    "tournesol_entity"."id" = "tournesol_entitycriteriascore"."entity_id"
  ) 
  INNER JOIN "tournesol_entitypollrating" ON (
    "tournesol_entity"."id" = "tournesol_entitypollrating"."entity_id"
  ) 
WHERE 
  (
    (
      "tournesol_entity"."metadata" -> 'publication_date'
    ) <= '"2023-12-17T08:00:00+00:00"' 
    AND "tournesol_entitycriteriascore"."poll_id" = 1 
    AND "tournesol_entitycriteriascore"."score_mode" = 'default' 
    AND "tournesol_entitypollrating"."poll_id" = 1 
    AND "tournesol_entitypollrating"."sum_trust_scores" > 1.5 
    AND "tournesol_entitypollrating"."tournesol_score" > 20
  )

unsafe

When fetching unsafe recommendations, the following query excluding criteria with a weight of zero (here only importance has a weight):

FROM 
  "tournesol_entity" 
  LEFT OUTER JOIN "tournesol_entitycriteriascore" ON (
    "tournesol_entity"."id" = "tournesol_entitycriteriascore"."entity_id"
  )
WHERE 
  (
    (
      "tournesol_entity"."metadata" -> 'publication_date'
    ) <= '"2023-12-17T08:00:00+00:00"' 
    -- keep only criteria with non zero weight
    AND "tournesol_entitycriteriascore"."criteria" IN ('importance') 
    --
    AND "tournesol_entitycriteriascore"."poll_id" = 1 
    AND "tournesol_entitycriteriascore"."score_mode" = 'default'
  ) 

... is less efficient than this one:

FROM 
  "tournesol_entity" 
  LEFT OUTER JOIN "tournesol_entitycriteriascore" ON (
    "tournesol_entity"."id" = "tournesol_entitycriteriascore"."entity_id"
  )
WHERE 
  (
    (
      "tournesol_entity"."metadata" -> 'publication_date'
    ) <= '"2023-12-17T08:00:00+00:00"' 
    AND "tournesol_entitycriteriascore"."poll_id" = 1 
    AND "tournesol_entitycriteriascore"."score_mode" = 'default'
  ) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Back-end code of Tournesol
Projects
kanban
Ready
Development

No branches or pull requests

1 participant