Partial match in GIN

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Partial match in GIN
Date: 2008-04-04 20:20:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

We (Oleg and me) would like to present patch implements partial match for GIN
index and two extensions which use this new feature. We hope that after short
review they will be committed to CVS.

This work was sponsored by EnterpriseDB.
Implements partial match for GIN. It extends interface of support function but
keeps backward compatibility. The basic idea is to find first greater or equal
value in index and scan sequentially until support function says stop. For each
matched entry all corresponding ItemPointers are collected in TIDBitmap
structure to effective merge ItemPointers from different entries. Patch
introduces following changes in interface:
- compare function has third (optional) argument, of boolean type, it points to
kind of compare: partial or exact match. If argument is equal to 'false',
function should produce comparing as usual, else function's result is
treated as:
= 0 - match
< 0 - doesn't match but continue scan
> 0 - stop scan
- extractQuery function has fourth (optional) argument of bool** type. Function
is responsible to allocate correct memory for that array with the same size
as returning array of searching entries. if extractQuery wishs to point
partial match for some entry it should set corresponding element of bool
array to true.

If function described above hasn't extra arguments then GIN will not be able to
use partial match.
Implements prefix search. This was one of the most wanted feature of text
search. Lexeme to partial match should be labeled with asterisk:

select count(*) from apod where fti @@ 'star:*';
or even
select count(*) from apod where fti @@ to_tsquery('star:*');

Dictionary may set a normalized lexeme with flag (TSL_PREFIX) to point to its
prefix path.

Here there is a unclean issue: now tsquery has new flag to label prefix search
and cstring representation has backward compatibility, but external binary
hasn't it now. Now, extra byte is used for storage of this flag. In other hand,
there 4 unused bits in external binary representation (in byte stores weights of
lexeme), so it's possible to use one of them to store this flag. What are opinions?
In short, it's a contrib module that speeds up LIKE operation with any kind of
expression, like 'foo%bar' or '%foo%' or even '%foo%bar'. This module is based
on partial match patch of GIN.

NOTICE 1: current index support of LIKE believes that only BTree can speed up
LIKE and becomes confused with this module with error 'unexpected opfamily' in
prefix_quals(). For this reason, partial match patch adds small check before
calling expand_indexqual_opclause().

NOTICE 2: it seems to me, that similar technique could be implemented for
ordinary BTree to eliminate hack around LIKE support.

Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru


Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-04-04 21:04:23 Re: Replace offnum++ by OffsetNumberNext
Previous Message Tom Lane 2008-04-04 20:16:09 Re: Expose checkpoint start/finish times into SQL.