Re: Yet another fast GiST build (typo)

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Erik Rijkers <er(at)xs4all(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Yet another fast GiST build (typo)
Date: 2020-08-14 09:21:32
Message-ID: CALT9ZEHfTYtfUeijSxwPodgzAfiZE-1o14da39mjC1_tHxhmhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I see this feature quite useful in concept and decided to test it.

On a real database of 7 million rows I observed speedup of 4 times in case
of single column index on points only and 2.5 times speedup in case of
index on points with several included columns. Standard deviation between
in series of measurements being of 10%. Index size saving was of 1.7-1.5
times respectively. Points were in all four quadrants.

On random points same as query in the original message it was observer 3
times speedup with the patch. Then I generated same points set but so they
will get into one quadrant didn't observe a difference with the previous
case. So probably anomaly in Morton curve not so big to induce noticeable
slowdown in a whole random set. But as the ordering is done only for index
and not used outside index it seems to me possible to introduce shifting
floating point coordinates respective to leftmost-bottom corner point and
thus make all of them positive to avoid anomaly of Morton curve near
quadrants transitions.

Of course speed measurements depend on machine and configuration a lot, but
I am sure anyway there is a noticeable difference in index build time and
this is quite valuable for end-user who build GiSt index on point type of
data. Furthermore same speedup is also for REBUILD INDEX CONCURRENTLY.
There short rebuild time also mean fewer user modified table rows during
rebuild which should be integrated in a newer index after rebuild.

This patch can be also seen as a step to futher introduce the other
ordering algoritms e.g. Gilbert curve and I consider this feature is useful
and is worth to be committed.

Both patches 0001 and 0002 when applied on version 14dev compile and work
cleanly. Regression tests are passed.
Code seems clean and legible for me.

In declaration I see little bit different style in similar argument
pointers/arrays:
extern IndexTuple gistFormTuple(GISTSTATE *giststate, Relation r, Datum
*attdata, bool *isnull, bool isleaf);
extern IndexTuple gistCompressValusAndFormTuple(GISTSTATE *giststate,
Relation r, Datum attdata[], bool isnull[], bool isleaf, Datum compatt[]);
I suppose this is because gistFormTuple previously had different style in
declaration and definition. Maybe it would be nice to change them to one
style in all code, I propose pointers instead of [].

In a big comment
/*
+ * In this function we need to compute Morton codes for non-integral
+ * components p->x and p->y. But Morton codes are defined only for
+ * integral values.
i don't quite caught meaning of "non-integral" and "integral" and propose
to replace it to "float" and "integers".

Also there are some extra spaces before line
"prev_level_start = level_start;"
and after
"The argument is a pointer to a <structname>SortSupport</structname>
struct."

Overall I see the patch useful and almost ready for commit.

вт, 4 авг. 2020 г. в 21:28, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>:

>
>
> > 30 июля 2020 г., в 06:26, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> написал(а):
> >
> > On Fri, Jul 10, 2020 at 6:55 PM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru>
> wrote:
> >> Thanks! Fixed.
> >
> > It's not a bug, but I think those 64 bit constants should be wrapped
> > in UINT64CONST(), following our convention.
> Thanks, fixed!
>
> > I'm confused about these two patches: 0001 introduces
> > gist_point_fastcmp(), but then 0002 changes it to gist_bbox_fastcmp().
> > Maybe you intended to keep both of them? Also 0002 seems to have
> > fixups for 0001 squashed into it.
> Indeed, that were fixups: point converted to GiST representation is a bbox
> already, and the function expects only bboxes.
>
> Also I've fixed some mismerges in documentation.
>
> Thanks!
>
> Best regards, Andrey Borodin.
>
>

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-08-14 10:36:31 Re: display offset along with block number in vacuum errors
Previous Message Julien Rouhaud 2020-08-14 09:02:35 Re: Collation versioning