GiST: PickSplit and multi-attr indexes

From: Neil Conway <neilc(at)samurai(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)stack(dot)net>
Subject: GiST: PickSplit and multi-attr indexes
Date: 2004-11-14 09:21:43
Message-ID: 419723A7.5040505@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg & Teodor,

If I understand the code correctly, GiST will only pass the first
attribute of each index tuple to the user-defined PickSplit method when
it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first
attribute in the index is sufficient to make a reasonable decision about
how to divide the node into two halves, but I don't think that is
universally true. For example, consider a two column index whose first
attribute has a small number of distinct values. It could well be that
all the first attribute values in a node-to-be-split would be the same.
Only passing the first attribute to PickSplit would result in an
essentially random distribution of tuples among the split nodes, rather
than allowing the GiST extension to make use of the second attribution
to partition the nodes. That's an extreme example, but it is easy to
construct more realistic scenarios (basically, any situation in which
the cardinality of the first index attribute is low -- a reasonably
common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation:
repeated invocations of PickSplit are one of the main factors in
deriving the ultimate shape of the GiST tree. Poor distribution of keys
resulting from PickSplit would eventually result in unnecessarily loose
bounding predicates in internal nodes, which would hurt performance.

Comments welcome,

Neil

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2004-11-14 09:33:01 postmaster segfaults with HUGE table
Previous Message Thomas Hallgren 2004-11-14 08:47:11 Re: CVS should die