Quantcast
Channel: Planet Object Pascal
Viewing all articles
Browse latest Browse all 1725

Žarko Gajić: Full Text TXT Search In Delphi (+FireDac +SQLite). Sample Working Project.

$
0
0

delphi-sqlite-firedac-fts
Finally, the sample project for my full text search implementation in Delphi is ready. The idea involves providing a user of my application the ability to do full text (Google like) searches on a set of documents by searching inside document content. For the task, I’ve selected SQLite for the database and FireDac for the database connectivity and operations.

For the sake of simplicity and the proof-of-concept approach this sample program is really simple: it will index all TXT files under some directory (including sub directories) and allow to do simple “match” type full text search queries. Certainly, in my real world application, the task is more complex and involves non-TXT files, some Windows service application and alike … but for the start, if you are up to implementing full text search, this sample project should give you a kick-start.

Download Sample Project Source Code. Download Executable Only.

The database selected is SQLite. SQLite is a self-contained, serverless, zero-configuration, transactional SQL database engine, and has the FTS engine built inside.
To connect to and operate on the database I’ve picked FireDac. FireDac knows how to talk to SQlite (any many other databases) and has proven to be robust enough when working with SQLite.

Here’s how this sample program works:

  1. The user selects a folder containing TXT files (in any subfolders)
  2. If the full text index database has not been created (i.e. the .SDB file does not exist)
    • the database is created
    • TXT files and their content are imported into database
  3. Display the files in a tree (Virtual Tree View)
  4. Each file / folder node has a check box to allow selection
  5. FTS queries can be done against the range of selected files

The Database: Multi User Read-Write SQLite Connection

SQLite is primarily used in single connection access to the database scenarios. In my scenario I have to allow multiple access (from various application and application instances) to open, read and write to the database. The TADConnection (+TADPhysSQLiteDriverLink) component is used to connect and talk to the database. By default, SQLite driver settings are set for single connection, to allow multiple connections one has to alter some parameters. Here’s how my connection parameters look:

DriverID=SQLite
Database= …
OpenMode=CreateUTF16
Password=aes-256:12345
LockingMode=Normal
JournalMode=WAL
Synchronous=Full
SQLiteAdvanced=temp_store=MEMORY
SQLiteAdvanced=page_size=4096
SQLiteAdvanced=auto_vacuum=FULL

SQLite database will be created at a connection establishment if it does not exist. I would suggest to read above the above parameters used in FireDac help.

After the connection has been established and the database has been created with tables to support FTS queries (+ referential integrity), we fill in the database:

The Data: TXT files and Their Content

A simple TDirectory.GetFiles is used to get all the TXT files in a selected directory.

var
  subFiles : TStringDynArray;
  subFile : string;
begin
  subFiles := TDirectory.GetFiles(
                main.MainForm.RootDirectory,
                TSearchOption.soAllDirectories,
                function(const Path: string; const SearchRec: TSearchRec): Boolean
                begin
                  result := SameText(ExtractFileExt(SearchRec.Name), '.txt');
                end);

  for subFile in subFiles do
    UpdateFileInDB(subFile);
end;

Files are imported into the database one by one:

function TDM.UpdateFileInDB(const fileName: string; const removeOld : boolean = false): boolean;
var
  sStream: TStringStream;
  documentRowID : integer;
begin
  result := true;

  if removeOld then
    sqliteConnection.ExecSQL('DELETE FROM Document WHERE FileName = ' + QuotedStr(fileName));

  sStream := TStringStream.Create;
  try
    sStream.LoadFromFile(fileName);

    try
      MainForm.LogInMemo(' +' + fileName);

      sqliteConnection.ExecSQL('INSERT INTO Document VALUES (NULL, :fileName)', [fileName]);

      documentRowID := sqliteConnection.ExecSQLScalar('SELECT LAST_INSERT_ROWID() AS rowid');

      //1 page per document for the sake of simplicity
      sqliteConnection.ExecSQL('INSERT INTO FTSData VALUES (NULL, :id, :page, :txt)', [documentRowID, 1, sStream.DataString]);
    except on E: Exception do
      begin
        result := false;
        MessageDlg('Error writing to database:' + E.Message, mtError, [mbOk], -1);
      end;
    end;
  finally
    sStream.Free;
  end;
end;

Now, all the files have been “indexed” and they are presented in the tree view control. I’ve used Virtual TreeView.

The FTS Search

Finally, specify what files to include in the FTS search, specify your token (word) you are looking for and hit “Search”:

procedure TDM.RetrieveSearchResults(const searchToken: string; const inFiles : TStringList = nil);
var
  fn : string;
  st : integer;
begin
  //get search results
  sqlQuery.SQL.Clear;

  //temp table for selected files
  sqlQuery.SQL.Add('CREATE TEMPORARY TABLE IF NOT EXISTS PartOfDocument (FileName STRING);');
  sqlQuery.SQL.Add('DELETE FROM PartOfDocument;');

  if Assigned(inFiles) then
    for fn in inFiles do
      sqlQuery.SQL.Add('INSERT INTO PartOfDocument VALUES (' + QuotedStr(fn) + ');');


  //main fts query
  sqlQuery.SQL.Add('SELECT Document.Id, Document.FileName FROM PageContent');
  sqlQuery.SQL.Add('INNER JOIN FilePages ON FilePages.rowid = PageContent.rowid');
  sqlQuery.SQL.Add('INNER JOIN Document ON Document.rowid = FilePages.DocumentID');
  if Assigned(inFiles) AND (inFiles.Count > 0) then
    sqlQuery.SQL.Add('INNER JOIN PartOfDocument ON PartOfDocument.FileName = Document.FileName');
  sqlQuery.SQL.Add('WHERE PageContent MATCH ' + QuotedStr(searchToken));
  sqlQuery.SQL.Add(';');

  sqlQuery.OpenOrExecute;

  MainForm.SearchResults.Clear;
  while NOT sqlQuery.Eof do
  begin
    MainForm.SearchResults.Add(
      TSearchResult.Create(
        sqlQuery.Fields[0].AsInteger, //doc.id
        sqlQuery.Fields[1].AsString  //file name
        ));
    sqlQuery.Next;
  end;

  sqlQuery.Close;
end;

A temporary table is created (if it does not exist), filled in with selected file names. The actual search query is a JOIN between different tables but the most important part is the MATCH on the FTS table. Results are displayed in a list view.

That’s it. Questions? Comments? More than welcome!


Viewing all articles
Browse latest Browse all 1725

Trending Articles