As a second step in my goal to implement full text search functionality in my Delphi application I had to create a sample SQLite database supporting FTS queries.
Full text search models in databases allow to do what Google does with web documents – quickly search for documents containing some word (token).
FTS enabled tables in SQLite are special tables created using the virtual table paradigm. There’s lots of info on FTS tables and their construction and usage on the SQLite web site.
Reminder: my idea is to see how to have full test search implemented in a Delphi application processing files under some read-only folder structure.
Each file (to end up in my FTS table) has several pages and the content of a page should be stored in a FTS table.
Warning: from now on I will presume you have enough knowledge on SQL syntax and SQLite to be able to read and understand “CREATE TABLE…”, “INSERT INTO…” and alike.
Initial idea would be to create the database using the following model (simplified):
Document | FileContent |
Id (primary field)
FileName | Id (primary field)
DocumentId (foreign key / ref integrity)
PageNumber
PageContent |
Therefore two tables. First table “Document” stores file names, this table is a parent/master table, while table “FileContent” is a child table storing several records (page number and page text content) for each entry in its parent table (each record in child is for one page of the file being processed).
In the above model the DocumentID field in FileContent table should reference Document.Id so that SQL queries like Delete and Update on the Document table would also delete/update the corresponding rows in the FileContent table.
If the above could work (and it does not as you will see) the DDL SQL would look like:
CREATE TABLE Document (
Id INTEGER PRIMARY KEY,
FileName TEXT NOT NULL,
);
/*SQLite Full Text Search Tables */
CREATE VIRTUAL TABLE FileContent USING FTS4 (
DocumentId INTEGER REFERENCES Document(Id) ON DELETE CASCADE ON UPDATE CASCADE,
Page INTEGER,
PageContent TEXT,
UNIQUE(DocumentId, Page)
);
The above unfortunately does NOT work.
Virtual tables in SQLite do NOT support any constraints (referential integrity, foreign keys)!
DDL for FTS virtual table allowing foreign keys
Luckily, SQLite is enough powerful to support views and triggers and even triggers on views! There are ways to have best of both worlds: referential integrity on FTS virtual tables.
After some testing, the structure of the database best matching my requirements is :
CREATE TABLE Document (
Id INTEGER PRIMARY KEY,
FileName TEXT NOT NULL,
);
CREATE TABLE FilePages(
DocumentId INTEGER REFERENCES Document(Id) ON DELETE CASCADE ON UPDATE CASCADE,
Page INTEGER,
UNIQUE(DocumentId, Page)
);
CREATE VIRTUAL TABLE PageContent USING fts4();
CREATE TRIGGER FilePages_DELETE AFTER DELETE ON FilePages
BEGIN
/* required when ref. integrity deletes rows in FilePages as
a result of delete row(s) operation in Document */
DELETE FROM PageContent WHERE rowid = OLD.rowid;
END;
CREATE VIEW FTSData AS
SELECT FilePages.rowid AS rowid, FilePages.DocumentId, FilePages.Page, PageContent.Content
FROM FilePages JOIN PageContent ON FilePages.rowid = PageContent.rowid;
CREATE TRIGGER FTSData_INSERT INSTEAD OF INSERT ON FTSData
BEGIN
INSERT INTO FilePages(DocumentId, Page) VALUES (NEW.DocumentId, NEW.Page);
INSERT INTO PageContent (rowid, Content) VALUES (last_insert_rowid(), NEW.Content);
END;
CREATE TRIGGER FTSData_DELETE INSTEAD OF DELETE ON FTSData
BEGIN
DELETE FROM FilePages WHERE rowid = OLD.rowid;
DELETE FROM PageContent WHERE rowid = OLD.rowid;
END;
Visually:
Document [1 - ∞ FilePages] | FilePages [1 - 1 PageContent] | PageContent |
rowid
Id (primary field – alias to “rowid”)
FileName | rowid
DocumentId (foreign key / ref integrity)
Page | rowid Content |
In SQLite, every row of every table has a signed integer ROWID field. The ROWID for each row is unique among all rows in the same table. If a table contains a column of type INTEGER PRIMARY KEY, then that column becomes an alias for the ROWID. If no columns are specified when creating the FTS table, a single column “Content” table is created.
Recipe: 3 tables, 1 view, and 3 triggers. Triggers are database operations that are automatically performed when a specified database event occurs. Views are pre-packaged SELECT statements.
The view FTSData serves to join the FTS table PageContent with its “one-to-one” relation table “FilePages” (the relation is actually imposed by triggers). FilePages are in one-to-many relation with Document being the parent table.
SQL Test
insert into Document values (1, 'file 1');
insert into FTSData values(null, 1, 1, 'doc 1 page 1');
insert into FTSData values(null, 1, 2, 'doc 1 page 2');
insert into Document values (2, 'file 2');
insert into FTSData values(null, 2, 1, 'doc 2 page 1');
insert into FTSData values(null, 2, 2, 'doc 2 page 2');
/* and finally a simple FTS query */
SELECT FROM PageContent WHERE MATCH 'page'
Any comments / better ideas?