Skip site navigation (1) Skip section navigation (2)

Partial match in GIN

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Partial match in GIN
Date: 2008-04-04 20:20:23
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-patches
We (Oleg and me) would like to present patch implements partial match for GIN
index and two extensions which use this new feature. We hope that after short
review they will be committed to CVS.

This work was sponsored by EnterpriseDB.
Implements partial match for GIN. It extends interface of support function but
keeps backward compatibility. The basic idea is to find first greater or equal
value in index and scan sequentially until support function says stop. For each
matched entry all corresponding ItemPointers are collected in TIDBitmap
structure to effective merge ItemPointers from different entries. Patch
introduces following changes in interface:
  - compare function has third (optional) argument, of boolean type, it points to
    kind of compare: partial or exact match. If argument is equal to 'false',
    function should produce comparing as usual, else function's result is
    treated as:
        = 0  - match
        < 0  - doesn't match but continue scan
        > 0  - stop scan
  - extractQuery function has fourth (optional) argument of bool** type. Function
    is responsible to allocate correct memory for that array with the same size
    as returning array of searching entries. if extractQuery wishs to point
    partial match for some entry it should set corresponding element of bool
    array to true.

If function described above hasn't extra arguments then GIN will not be able to
use partial match.
Implements prefix search. This was one of the most wanted feature of text 
search. Lexeme to partial match should be labeled with asterisk:

select count(*) from apod where fti @@ 'star:*';
or even
select count(*) from apod where fti @@ to_tsquery('star:*');

Dictionary may set a normalized lexeme with flag (TSL_PREFIX) to point to its 
prefix path.

Here there is a unclean issue: now tsquery has new flag to label prefix search 
and cstring representation has backward compatibility, but external binary 
hasn't it now. Now, extra byte is used for storage of this flag. In other hand, 
there 4 unused bits in external binary representation (in byte stores weights of 
lexeme), so it's possible to use one of them to store this flag. What are opinions?
In short, it's a contrib module that speeds up LIKE operation with any kind of
expression, like 'foo%bar' or '%foo%' or even '%foo%bar'. This module is based
on partial match patch of GIN.

NOTICE 1: current index support of LIKE believes that only BTree can speed up 
LIKE and becomes confused with this module with error 'unexpected opfamily' in
prefix_quals(). For this reason, partial match patch adds small check before
calling expand_indexqual_opclause().

NOTICE 2: it seems to me, that similar technique could be implemented for 
ordinary BTree to eliminate hack around LIKE support.

Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru


pgsql-patches by date

Next:From: Tom LaneDate: 2008-04-04 21:04:23
Subject: Re: Replace offnum++ by OffsetNumberNext
Previous:From: Tom LaneDate: 2008-04-04 20:16:09
Subject: Re: Expose checkpoint start/finish times into SQL.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group