Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full text search (FTS 3,4,5) #25

Open
helje5 opened this issue Oct 11, 2023 · 1 comment
Open

Add support for full text search (FTS 3,4,5) #25

helje5 opened this issue Oct 11, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@helje5
Copy link
Member

helje5 commented Oct 11, 2023

It would be nice to detect FTS tables and do special things with them. I.e. also omit FTS backing tables from Lighter schemas (they need to be included in schemas if DB recreation is on, but no regular Swift funcs etc should be emitted for them?).

Table Detection

The tables can be detected by looking at the SQL in the metadata:

CREATE VIRTUAL TABLE papers USING fts3(...)

i.e. "VIRTUAL TABLE" and "USING fts*".

FTS tables, even contentful, have no associated column types (though they should be all TEXT?).

FTS creates a set of supporting, regular, tables that should be omitted, FTS 3:

papers_content 
papers_segments
papers_segdir  

FTS 4:

papers4_content 
papers4_segments
papers4_segdir  
papers4_docsize 
papers4_stat    

FTS 5:

email_data     
email_idx      
email_content  
email_docsize  
email_config   

So first the FTS table needs to be detected, than the associated tables need to be tagged/excluded.

External content FTS tables can refer to a backing table ("documents" in this case):

USING fts5(title, value, content='documents')

So that can be detected too (i.e. whether a regular table has one (or more! e.g. for multiple languages) FTS backing tables).

Operations

It is unclear what kind of operations would need to be supported? Everything in FTS centers around MATCH queries, like:

SELECT * FROM mail WHERE body MATCH 'sqlite';

Note that FTS table columns are completely untyped (but except for rowid always TEXT?)
The rowid is also more important to match up w/ the content table and needs to be exposed? (e.g. it can be aliased to an integer primary key and is also stable between VACUUMs).

Then FTS also has a set of special functions, like highlight and snippet:

SELECT highlight(email, 2, '<b>', '</b>') FROM email WHERE email MATCH 'fts5';

2 is the 0 based column index here ("email" is the FTS table). Also note that the table can be used on the left side of the match.

If an FTS table has a backing table (content='mytable'), maybe the ops should be directly attached to the content table, like sqlite_mytable_fts(...).

FTS 5 query language:

<phrase>    := string [*]
<phrase>    := <phrase> + <phrase>
<neargroup> := NEAR ( <phrase> <phrase> ... [, N] )
<query>     := [ [-] <colspec> :] [^] <phrase>
<query>     := [ [-] <colspec> :] <neargroup>
<query>     := [ [-] <colspec> :] ( <query> )
<query>     := <query> AND <query>
<query>     := <query> OR <query>
<query>     := <query> NOT <query>
<colspec>   := colname
<colspec>   := { colname1 colname2 ... }

Ordering has a special rank column (ORDER BY rank), that would need to be made available.

@helje5 helje5 added the enhancement New feature or request label Oct 11, 2023
@helje5
Copy link
Member Author

helje5 commented Oct 11, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant