Quick Links

Re: GiST: PickSplit and multi-attr indexes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)MIT(dot)EDU>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: GiST: PickSplit and multi-attr indexes
Date:	2004-11-16 21:37:20
Message-ID:	2761.1100641040@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg Stark <gsstark(at)MIT(dot)EDU> writes:
> The approach they take is to have a function which calculates an
> abstract "distance" between any two entries. There's an algorithm that
> they use to pick the split based on this distance function.

> If you abandoned "PickSplit" and instead exposed this distance
> function as the external API then the behaviour for multi-column
> indexes is clear. You calculate the distance along all the axes and
> calculate the diagonal distance.

Hmm ... the problem with that is the assumption that different opclasses
will compute similarly-scaled distances. If opclass A generates
distances in the range (0,1e6) while B generates in the range (0,1),
combining them with Euclidean distance won't work well at all. OTOH you
can't blindly normalize, because in some cases maybe the data is such
that a massive difference in distances is truly appropriate.

I'm also a bit leery of the assumption that every GiST application can
reduce its PickSplit logic to Euclidean distances.

regards, tom lane

In response to

Re: GiST: PickSplit and multi-attr indexes at 2004-11-16 21:12:57 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2004-11-16 22:06:15	Re: [PATCHES] plperl Safe restrictions
Previous Message	John Hansen	2004-11-16 21:30:14	Unicode characters above 0x10000 #2