Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emulation of FIRST_ROWS_N with DBs that don't support that #148

Open
GitMensch opened this issue May 24, 2023 · 0 comments
Open

Emulation of FIRST_ROWS_N with DBs that don't support that #148

GitMensch opened this issue May 24, 2023 · 0 comments

Comments

@GitMensch
Copy link
Contributor

GitMensch commented May 24, 2023

Background

In general a DB commonly knows exactly what is best to get the requested result with the minimal time and/or computing cost (at least Oracle specifies that by default it optimizes for cost, to enable the server handling thousands of requests at the same time).

With EXEC SQL often being a "replacement" for ORGANIZATION INDEXED, it is often necessary to "mimic" its benefits.

  • DB: computes and gets the result on SELECT (mostly affected by db server performance and result size, also by usable indexes), gets rows by FETCH (mostly affected by possible network io [roundtrip])
  • ISAM: sets the record position by START, then reads rows by READ

The benefit in the first case is that the result can (normally should) be directly limited; the benefit in the second case is that START is an "instant operation".

When people convert from ISAM to DB , the queries are often quite unspecified, most important SELECT something FROM tab WHERE keyfield >= val; most DBs will do a table scan for this query and prepare to result all following rows. If the application then re-executes this after FETCHing only some rows the performance heavily decreases.
A solution - if the application knows how much data to look for - is to add FIRST n ROWS ONLY / LIMIT to the statement; but that does not allow to fetch more data without a "restart" after this limit is reached.

Workaround (Oracle)

At least Oracle allows to specify "I (possibly) would like to get the complete result, but I want to get the first records asap"; this is done by applying the fetch_first_rows optimizer hint at the system/session/select level.
Apart from other effects it will normally lead to an appropriate index being used, even for a big result set - and it gets the first 1/10/1000 entries fast, sending them to the client and waiting for the client to fetch more.

Possible emulation?

Would it be possible to use something similar with GixSQL even outside of the oracle client (where it would just use ALTER SESSION or putting the hint into every query)?

An emulation (only possible if the query contains an ORDER BY) could ask "up to" 2 times:

a) async

  • add a FIRST n ROWS ONLY clause to the statement (or using a lower "n") when necessary
  • send the query to the DB (this will also handle "no result" and "error" conditions)
  • sent the return to the application, but before doing that (if there was no error and the returned rows are less than "n - 1") re-executing it without a limit in async mode
  • if the application FETCHes more than the artificial limit, switch to the result that comes from the async request

b) if needed - using row pagination [should there be a separate FR to allow (auto-)pagination on the driver site?]

  • add a FIRST n ROWS ONLY clause to the statement (or using a lower "n") when necessary
  • send the query to the DB
  • sent the return to the application
  • if the application FETCHes more than the artificial limit: ask again, but using an offset from the previous "n"

The option a) has the benefit of being guaranteed correct (we query the same thing directly after each other, so the resultset won't change) but has the downside that we increase the DB load (the first with the limit should be quite fast if the query is not super-complex, the second will need the same time as before).
The option b) has the benefit of lowering DB load (we only ask for a limited number), but if the transaction itself changes the underlying data, the "re-fetch query" will possibly get other data.

Of course: an emulation could be depending on the GixSQL connection string, it could default to "off" and even provide both options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant