From: | Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net> |
---|---|
To: | "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Yet another fast GiST build |
Date: | 2020-09-09 07:05:04 |
Message-ID: | CAC8Q8t+0sHUWZTpk23QnW5UFAn1kuNq+YhM6encWxZXWz32e4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>
wrote:
>
>
> > 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> написал(а):
> >
> > I've been reviewing the patch today. The biggest changes I've made have
> been in restructuring the code in gistbuild.c for readability, but there
> are a bunch of smaller changes throughout. Attached is what I've got so
> far, squashed into one patch.
> Thanks!
>
> > I'm continuing to review it, but a couple of questions so far:
> >
> > In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive
> == false'. That seems fishy, surely we need to index recently-dead tuples,
> too. The normal index build path isn't skipping them either.
> That's an oversight.
> >
> > How does the 'sortsupport' routine interact with
> 'compress'/'decompress'? Which representation is passed to the comparator
> routine: the original value from the table, the compressed representation,
> or the decompressed representation? Do the comparetup_index_btree() and
> readtup_index() routines agree with that?
>
> Currently we pass compressed values, which seems not very good.
> But there was a request from PostGIS maintainers to pass values before
> decompression.
> Darafei, please, correct me if I'm wrong. Also can you please provide link
> on PostGIS B-tree sorting functions?
>
We were expecting to reuse btree opclass for this thing. This way
btree_gist extension will become a lot thinner. :)
Core routine for current sorting implementation is Hilbert curve, which is
based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893
All the btree functions are wrappers around gserialized_cmp which just adds
a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313
Base representation for index compressed datatype is GIDX, which is also a
box. We can make it work on top of it instead of the original
representation.
There is no such thing as "decompressed representation" unfortunately as
compression is lossy.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-09-09 07:11:23 | Re: Inconsistency in determining the timestamp of the db statfile. |
Previous Message | Peter Smith | 2020-09-09 06:53:31 | Re: Improvements in Copy From |