Re: Proposal: q-gram GIN and GiST indexes

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: q-gram GIN and GiST indexes
Date: 2011-04-05 13:39:15
Message-ID: BANLkTi=vY-MG_eyyONEqPj4uy3fOY=P0qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2011 at 5:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I am probably being stupid here, but doesn't the number of links to
> rows grow proportionately to the number of n-grams?

Number of links to rows grow proportionally to total number of extracted
q-grams, but not proportionally to number of unique q-grams. Though, if
extracted q-grams are not unique inside same indexed value, then it can
reduce number of links (but it is rarity).
Lets consider simple example. Two rows contains strings 'aaa' and 'aaab'. We
extract 3-gram 'aaa' from first string and 3-grams 'aaa' and 'aab' from
second string (for simplicity, there is no padding here). GIN index will
contain structure, which can be represented so:
'aaa' => 1, 2
'aab' => 2
We can see, that there are 2 unique 3-grams, but 3 links to the rows.

----
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-04-05 13:44:44 Re: Typed-tables patch broke pg_upgrade
Previous Message Robert Haas 2011-04-05 13:37:51 Re: cast from integer to money