Re: Understanding GIN posting trees

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-14 21:10:00
Message-ID: 4E1F5B28.1060506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14.07.2011 17:28, Teodor Sigaev wrote:
>> Why is the posting tree a tree? AFAICS, we never search it using the
>> TID, it's always scanned in whole. It would be simpler to store the TIDs
>> in a posting list in no particular order. This could potentially make
>> insertions cheaper, as you could just append to the last posting list
>> page for the key, instead of traversing the posting tree to a particular
>> location. You could also pack the tids denser, as you wouldn't need to
>> reserve free space for additions in the middle.
> For consistentFn call we need to collect all data for current tid. We do
> that by scanning over logical ordered arrays of tids and trees allows to
> do that by scanning a leafs pages.

Oh, I see. You essentially do a merge join of all the posting trees of
query keys.

Hmm, but we do need to scan all the posting trees of all the matched
keys in whole anyway. We could collect all TIDs in the posting lists of
all the keys into separate TIDBitmaps, and then combine the bitmaps,
calling consistentFn for each TID that was present in at least one
bitmap. I guess the performance characteristics of that would be
somewhat different from what we have now, and you'd need to keep a lot
of in-memory bitmaps if the query contains a lot of keys.

While we're at it, it just occurred to me that it if the number of query
keys is limited, say <= 16, you could build a lookup table for each
combination of keys either occurring or not. You could use then use that
instead of calling consistentFn for each possible match. You could even
use the table to detect common cases like "all/any keys must match",
perhaps opening some optimization opportunities elsewhere.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2011-07-14 21:10:19 Re: pg_class.relistemp
Previous Message Simon Riggs 2011-07-14 21:07:53 Re: Is there a committer in the house?