Re: Yet another fast GiST build

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Yet another fast GiST build
Date: 2020-09-09 07:05:04
Message-ID: CAC8Q8t+0sHUWZTpk23QnW5UFAn1kuNq+YhM6encWxZXWz32e4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>
wrote:

>
>
> > 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> написал(а):
> >
> > I've been reviewing the patch today. The biggest changes I've made have
> been in restructuring the code in gistbuild.c for readability, but there
> are a bunch of smaller changes throughout. Attached is what I've got so
> far, squashed into one patch.
> Thanks!
>
> > I'm continuing to review it, but a couple of questions so far:
> >
> > In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive
> == false'. That seems fishy, surely we need to index recently-dead tuples,
> too. The normal index build path isn't skipping them either.
> That's an oversight.
> >
> > How does the 'sortsupport' routine interact with
> 'compress'/'decompress'? Which representation is passed to the comparator
> routine: the original value from the table, the compressed representation,
> or the decompressed representation? Do the comparetup_index_btree() and
> readtup_index() routines agree with that?
>
> Currently we pass compressed values, which seems not very good.
> But there was a request from PostGIS maintainers to pass values before
> decompression.
> Darafei, please, correct me if I'm wrong. Also can you please provide link
> on PostGIS B-tree sorting functions?
>

We were expecting to reuse btree opclass for this thing. This way
btree_gist extension will become a lot thinner. :)

Core routine for current sorting implementation is Hilbert curve, which is
based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893

All the btree functions are wrappers around gserialized_cmp which just adds
a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313

Base representation for index compressed datatype is GIDX, which is also a
box. We can make it work on top of it instead of the original
representation.
There is no such thing as "decompressed representation" unfortunately as
compression is lossy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-09 07:11:23 Re: Inconsistency in determining the timestamp of the db statfile.
Previous Message Peter Smith 2020-09-09 06:53:31 Re: Improvements in Copy From