Re: Speeding up GIST index creation for tsvectors

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speeding up GIST index creation for tsvectors
Date: 2021-01-27 15:06:28
Message-ID: CAJ3gD9chdDRpz_1rAwfoF0okEznN9o6O89HdZbB0NXHkFSp7Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 15 Dec 2020 at 20:34, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Sun, 13 Dec 2020 at 9:28 PM, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > +1
> > This will make all INSERTs and UPDATES for tsvector's GiSTs.
>
> Oh, I didn't realize that this code is getting used in GIST index
> insertion and creation too. Will check there.

I ran some insert and update tests; they show only marginal
improvement. So looks like the patch is mainly improving index builds.

> > Meanwhile there are at least 4 incarnation of hemdistsign() functions that are quite similar. I'd propose to refactor them somehow...
>
> Yes, I hope we get the benefit there also. Before that, I thought I
> should post the first use-case to get some early comments. Thanks for
> your encouraging comments :)

The attached v2 version of 0001 patch extends the hemdistsign()
changes to the other use cases like intarray, ltree and hstore. I see
the same index build improvement for all these types.

Since for the gist index creation of some of these types the default
value for siglen is small (8-20), I tested with small siglens. For
siglens <= 20, particularly for values that are not multiples of 8
(e.g. 10, 13, etc), I see a 1-7 % reduction in speed of index
creation. It's probably because of
an extra function call for pg_xorcount(); and also might be due to the
extra logic in pg_xorcount() which becomes prominent for shorter
traversals. So for siglen less than 32, I kept the existing method
using byte-by-byte traversal.

--
Thanks,
-Amit Khandekar
Huawei Technologies

Attachment Content-Type Size
0002-Avoid-function-pointer-dereferencing-for-pg_popcount.patch text/x-patch 7.1 KB
0001-Speed-up-xor-ing-of-two-gist-index-signatures-for-ts-v2.patch text/x-patch 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-01-27 15:37:52 Re: Online checksums patch - once again
Previous Message Bruce Momjian 2021-01-27 14:11:27 Re: Key management with tests