Re: Speeding up GIST index creation for tsvectors

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speeding up GIST index creation for tsvectors
Date: 2021-03-03 18:02:18
Message-ID: CAFBsxsH6E-R57KgGSgbx-8N5tMq+xoLtuT5utwTz0Va5sb5GnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 27, 2021 at 11:07 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
wrote:

Hi Amit,

Your performance numbers look like this is a fruitful area to improve. I
have not yet tested performance, but I will do so at a later date. I did
some microbenchmarking of our popcount implementation, since I wasn't quite
sure it's optimal, and indeed, there is room for improvement there [1]. I'd
be curious to hear your thoughts on those findings. I think it'd be worth
it to test a version of this patch using those idioms here as well, so at
some point I plan to post something.

Now for the patches:

0001:

+ /*
+ * We can process 64-bit chunks only if both are mis-aligned by the same
+ * number of bytes.
+ */
+ if (b_aligned - b == a_aligned - a)

The obvious question here is: how often are they identically misaligned?
You don't indicate that your measurements differ in a bimodal fashion, so
does that mean they happen to be always (mis)aligned? Is that an accident
of the gist coding and could change unexpectedly? And how hard would it be
to allocate those values upstream so that the pointers are always aligned
on 8-byte boundaries? (I imagine pretty hard, since there are multiple
callers, and such tight coupling is not good style)

+ /* For smaller lengths, do simple byte-by-byte traversal */
+ if (bytes <= 32)

You noted upthread:

> Since for the gist index creation of some of these types the default
> value for siglen is small (8-20), I tested with small siglens. For
> siglens <= 20, particularly for values that are not multiples of 8
> (e.g. 10, 13, etc), I see a 1-7 % reduction in speed of index
> creation. It's probably because of
> an extra function call for pg_xorcount(); and also might be due to the
> extra logic in pg_xorcount() which becomes prominent for shorter
> traversals. So for siglen less than 32, I kept the existing method
> using byte-by-byte traversal.

I wonder if that can be addressed directly, while cleaning up the loops and
alignment checks in pg_xorcount_long() a little. For example, have a look
at pg_crc32c_armv8.c -- it might be worth testing a similar approach.

Also, pardon my ignorance, but where can I find the default siglen for
various types?

+/* Count the number of 1-bits in the result of xor operation */
+extern uint64 pg_xorcount_long(const unsigned char *a, const unsigned char
*b,
+ int bytes);
+static inline uint64 pg_xorcount(const unsigned char *a, const unsigned
char *b,
+ int bytes)

I don't think it makes sense to have a static inline function call a global
function.

-static int
+static inline int
hemdistsign(BITVECP a, BITVECP b, int siglen)

Not sure what the reason is for inlining all these callers. Come to think
of it, I don't see a reason to have hemdistsign() at all anymore. All it
does is call pg_xorcount(). I suspect that's what Andrey Borodin meant when
he said upthread:

> > > Meanwhile there are at least 4 incarnation of hemdistsign() functions
that are quite similar. I'd propose to refactor them somehow...

0002:

I'm not really happy with this patch. I like the idea of keeping indirect
calls out of non-x86 platforms, but I believe it could be done more simply.
For one, I don't see a need to invent a third category of retail function.
Second, there's no reason to put "nonasm" in the header as a static inline,
and then call from there to the new now-global function "slow". Especially
since the supposed static inline is still needed as a possible value as a
function pointer on x86, so the compiler is not going to inline it on x86
anyway. That just confuses things. (I did make sure to remove indirect
calls from the retail functions in [1], in case we want to go that route).

[1]
https://www.postgresql.org/message-id/CAFBsxsFCWys_yfPe4PoF3%3D2_oxU5fFR2H%2BmtM6njUA8nBiCYug%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2021-03-03 18:05:17 Re: Table AM modifications to accept column projection lists
Previous Message Vik Fearing 2021-03-03 17:52:04 Re: Catalog version access