From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Peter Tanski <ptanski(at)raditaz(dot)com> |
Cc: | Yeb Havinga <yebhavinga(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: GiST seems to drop left-branch leaf tuples |
Date: | 2010-11-24 06:09:20 |
Message-ID: | Pine.LNX.4.64.1011240907350.12632@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter, glad to know you succeeded. FYI, a year ago we developed GiST
extension for rdkit.org.
Oleg
On Tue, 23 Nov 2010, Peter Tanski wrote:
> I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help.
>
> For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased.
>
> On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:
>
>> On 2010-11-23 20:54, Peter Tanski wrote:
>>> On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
>>>>>>> j = 0;
>>>>>>> for (i = FirstOffsetNumber; i< maxoff; i = OffsetNumberNext(i)) {
>>>>>>> FPrint* v = deserialize_fprint(entv[i].key);
>>>>>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>>>>>> computation is wrong.
>>>> The first for loop of all others compare with i<= maxoff instead of i< maxoff.
>>> You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.)
>>>
>>> On the other hand, there are two problems:
>>>
>>> 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size.
>>>
>>> 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself:
>> I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec->n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4.
>>
>> btw: interesting topic, audio fingerprinting!
>>
>> regards,
>> Yeb Havinga
>>
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2010-11-24 06:11:31 | Re: Horizontal Write Scaling |
Previous Message | Tom Lane | 2010-11-24 06:06:17 | Re: final patch - plpgsql: for-in-array |