Re: Yet another fast GiST build

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Subject: Re: Yet another fast GiST build
Date: 2021-04-07 13:18:53
Message-ID: 7386285b-0e2f-e89e-81f4-f63775becb2e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/04/2021 15:12, Andrey Borodin wrote:
>> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
>> написал(а):
>>
>> Ok, I think I understand that now. In btree_gist, the *_cmp()
>> function operates on non-leaf values, and *_lt(), *_gt() et al
>> operate on leaf values. For all other datatypes, the leaf and
>> non-leaf representation is the same, but for bit/varbit, the
>> non-leaf representation is different. The leaf representation is
>> VarBit, and non-leaf is just the bits without the 'bit_len' field.
>> That's why it is indeed correct for gbt_bitcmp() to just use
>> byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field
>> separately. That's subtle, and 100% uncommented.
>>
>> What that means for this patch is that gbt_bit_sort_build_cmp()
>> should *not* call byteacmp(), but bitcmp(). Because it operates on
>> the original datatype stored in the table.
>
> +1 Thanks for investigating this. If I understand things right,
> adding test values with different lengths of bit sequences would not
> uncover the problem anyway?

That's right, the only consequence of a "wrong" sort order is that the
quality of the tree suffers, and scans need to scan more pages
unnecessarily.

I tried to investigate this by creating a varbit index with and without
sorting, and compared them with pageinspect, but in quick testing, I
wasn't able to find cases where the sorted version was badly ordered. I
guess I didn't find the right data set yet.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-07 13:21:49 Re: CREATE SEQUENCE with RESTART option
Previous Message Julien Rouhaud 2021-04-07 12:57:26 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?