Re: Speeding up GIST index creation for tsvectors

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speeding up GIST index creation for tsvectors
Date: 2021-03-19 20:49:43
Message-ID: CAFBsxsHNctpF697qz4VgJTuzYLg3UbVdKtSTbxtGWvVa6jNi0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 19, 2021 at 8:57 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
wrote:
>
> On Thu, 11 Mar 2021 at 04:25, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
> > Okay, so it's hard-coded in various functions in contrib modules. In
that
> > case, let's just keep the existing coding for those. In fact, the
comments
> > that got removed by your patch specifically say: /* Using the popcount
> > functions here isn't likely to win */ ...which your testing confirmed.
The
> > new function should be used only for Gist and possibly Intarray, whose
> > default is 124. That means we can't get rid of hemdistsign(), but that's
> > fine.
>
> The comment is there for all types. Since I get the performance better
> on all the types, I have kept the pg_xorcount() call for all these
> contrib modules. I understand that since for some types default
> siglen is small, we won't get benefit. But I think we should call
> pg_xorcount() for the benefit of non-default siglen case.

The 1-7% degradation measured was from an earlier version, when
pg_xorcount_long had a lot of finicky branching and computation. Is it
still true in v3? We should answer that first. I'm interested in what
happens if you now use pg_xorcount_long in the call sites, at least in the
worst case 7% test.

> Have replaced hemdistsign() by pg_xorcount() everywhere; but with
> that, the call looks a bit clumsy because of having to type-cast the
> parameters to const unsigned char *. I noticed that the cast to
> "unsigned char *" is required only when we use the value in the
> pg_number_of_ones array. Elsewhere we can safely use 'a' and 'b' as
> "char *". So I changed the pg_xorcount() parameters from unsigned char
> * to char *.

That matches the style of that file, so +1.

> > I think the way to go is a simplified version of your 0001 (not 0002),
with
> > only a single function, for gist and intarray only, and a style that
better
> > matches the surrounding code. If you look at my xor functions in the
attached
> > text file, you'll get an idea of what it should look like. Note that it
got
> > the above performance without ever trying to massage the pointer
alignment.
> > I'm a bit uncomfortable with the fact that we can't rely on alignment,
but
> > maybe there's a simple fix somewhere in the gist code.
>
> In the attached 0001-v3 patch, I have updated the code to match it
> with surrounding code; specifically the usage of *a++ rather than
> a[i].
>
> Regarding the alignment changes... I have removed the code that
> handled the leading identically unaligned bytes, for lack of evidence
> that percentage of such cases is significant. Like I noted earlier,
> for the tsearch data I used, identically unaligned cases were only 6%.
> If I find scenarios where these cases can be significant after all and
> if we cannot do anything in the gist index code, then we might have to
> bring back the unaligned byte handling. I didn't get a chance to dig
> deeper into the gist index implementation to see why they are not
> always 8-byte aligned.

I find it stranger that something equivalent to char* is not randomly
misaligned, but rather only seems to land on 4-byte boundaries.

[thinks] I'm guessing it's because of VARHDRSZ, but I'm not positive.

FWIW, I anticipate some push back from the community because of the fact
that the optimization relies on statistical phenomena.

> I have kept the 0002 patch as-is. Due to significant *additional*
> speedup, over and above the 0001 improvement, I think the code
> re-arrangement done is worth it for non-x86 platforms.

For the amount of code uglification involved, we should be getting full asm
popcount support on Arm, not an attempt to kluge around the current
implementation. I'd be happy to review such an effort for PG15, by the way.

Readability counts, and as it stands I don't feel comfortable asking a
committer to read 0002.
--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-03-19 21:19:49 Re: [HACKERS] Custom compression methods
Previous Message James Coleman 2021-03-19 20:40:59 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays