Re: Yet another fast GiST build

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Subject: Re: Yet another fast GiST build
Date: 2021-04-07 11:56:42
Message-ID: c0846e34-8b3a-e1bf-c88e-021eb241a481@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/04/2021 09:00, Heikki Linnakangas wrote:
> On 08/03/2021 19:06, Andrey Borodin wrote:
>> There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
>> Here's its version which needs review.
>
> Reviewing this now again. One thing caught my eye:
>
>> +static int
>> +gbt_bit_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
>> +{
>> + return DatumGetInt32(DirectFunctionCall2(byteacmp,
>> + PointerGetDatum(a),
>> + PointerGetDatum(b)));
>> +}
>
> That doesn't quite match the sort order used by the comparison
> functions, gbt_bitlt and such. The comparison functions compare the bits
> first, and use the length as a tie-breaker. Using byteacmp() will
> compare the "bit length" first. However, gbt_bitcmp() also uses
> byteacmp(), so I'm a bit confused. So, huh?

Ok, I think I understand that now. In btree_gist, the *_cmp() function
operates on non-leaf values, and *_lt(), *_gt() et al operate on leaf
values. For all other datatypes, the leaf and non-leaf representation is
the same, but for bit/varbit, the non-leaf representation is different.
The leaf representation is VarBit, and non-leaf is just the bits without
the 'bit_len' field. That's why it is indeed correct for gbt_bitcmp() to
just use byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len'
field separately. That's subtle, and 100% uncommented.

What that means for this patch is that gbt_bit_sort_build_cmp() should
*not* call byteacmp(), but bitcmp(). Because it operates on the original
datatype stored in the table.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-04-07 11:57:03 Re: hba.c:3160:18: warning: comparison of unsigned enum expression
Previous Message Michael Banck 2021-04-07 11:53:15 Re: [PATCH] New default role allowing to change per-role/database settings