Re: Yet another fast GiST build

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Subject: Re: Yet another fast GiST build
Date: 2021-04-07 10:23:42
Message-ID: 53ff60bc-9eee-c723-54de-1615433a3610@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/04/2021 09:00, Heikki Linnakangas wrote:
> On 08/03/2021 19:06, Andrey Borodin wrote:
>> There were numerous GiST-build-related patches in this thread. Yet uncommitted is a patch with sortsupport routines for btree_gist contrib module.
>> Here's its version which needs review.

Committed with small fixes. I changed the all functions to use
*GetDatum() and DatumGet*() macros, instead of just comparing Datums
with < and >. Datum is unsigned, while int2, int4, int8 and money are
signed, so that changes the sort order around 0 for those types to be
the same as the picksplit and picksplit functions use. Not a correctness
issue, the sorting only affects the quality of the index, but let's be tidy.

This issue remains:

> Reviewing this now again. One thing caught my eye:
>
>> +static int
>> +gbt_bit_sort_build_cmp(Datum a, Datum b, SortSupport ssup)
>> +{
>> + return DatumGetInt32(DirectFunctionCall2(byteacmp,
>> + PointerGetDatum(a),
>> + PointerGetDatum(b)));
>> +}
>
> That doesn't quite match the sort order used by the comparison
> functions, gbt_bitlt and such. The comparison functions compare the bits
> first, and use the length as a tie-breaker. Using byteacmp() will
> compare the "bit length" first. However, gbt_bitcmp() also uses
> byteacmp(), so I'm a bit confused. So, huh?

Since we used byteacmp() previously for picksplit, too, this is
consistent with the order you got previously. It still seems wrong to me
and should be investigated, but it doesn't need to block this patch.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-07 10:25:50 CREATE SEQUENCE with RESTART option
Previous Message Amit Kapila 2021-04-07 10:01:37 Re: New IndexAM API controlling index vacuum strategies