Re: GiST seems to drop left-branch leaf tuples

From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-24 03:12:18
Message-ID: 218BEF96-3524-41EB-A15C-67CA3DAD4B58@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help.

For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased.

On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:

> On 2010-11-23 20:54, Peter Tanski wrote:
>> On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
>>>>>> j = 0;
>>>>>> for (i = FirstOffsetNumber; i< maxoff; i = OffsetNumberNext(i)) {
>>>>>> FPrint* v = deserialize_fprint(entv[i].key);
>>>>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>>>>> computation is wrong.
>>> The first for loop of all others compare with i<= maxoff instead of i< maxoff.
>> You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.)
>>
>> On the other hand, there are two problems:
>>
>> 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size.
>>
>> 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself:
> I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec->n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4.
>
> btw: interesting topic, audio fingerprinting!
>
> regards,
> Yeb Havinga
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-24 03:13:08 Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Previous Message Robert Haas 2010-11-24 03:07:01 Re: security hooks on object creation