From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | James Coleman <jtc331(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date: | 2021-04-08 06:50:29 |
Message-ID: | CAApHDvrw21K-Os3FKobrNvvJ0yp+eq6N7OET2QnwxGS=LvOyzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 8 Apr 2021 at 05:54, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> I only ran that with a single client, the machine only has 4 cores and
> this should not be related to concurrency, so 1 client seems fine. The
> average of 10 runs, 15 seconds each look like this:
Thanks for running these tests. The reason I added so much
concurrency was that this AMD machine has some weird behaviour in
regards to power management. Every bios update I seem to get changes
the power management but it still is very unstable despite me having
it on the most optimal settings in the bios. Working it a bit harder
seems to help make it realise that there might be some urgency.
> simple prepared 10/s 10/p 100/s 100/p
> -------------------------------------------------------------
> master 21847 59476 23343 59380 11757 56488
> v4 21546 57757 22864 57704 11572 57350
> v4+v5 23374 56089 24410 56140 14765 55302
>
> The first two columns are your bench.sql, with -M simple or prepared.
> The other columns are 10 or 100 values, /s is simple, /p is prepared.
>
> Compared to master:
>
> simple prepared 10/s 10/p 100/s 100/p
> -------------------------------------------------------------
> v4 98.62% 97.11% 97.95% 97.18% 98.43% 101.52%
> v4+v5 106.99% 94.31% 104.57% 94.54% 125.59% 97.90%
>
> That seems to mostly match your observation - there's a small
> performance hit (~2%), although that might be due to changes in the
> layout of the binary. And v4+v5 improves that a bit (even compared to
> master), although I don't see the same 20% speedup.
I've spent more time hacking at this patch. I had a bit of a change
of heart earlier about having this new HashedScalarArrayOpExpr node
type. There were more places that I imagined that I needed to add
handling for it. For example, partprune.c needed to know about it to
allow partition pruning on them. While supporting that is just a few
lines to make a recursive call passing in the underlying
ScalarArrayOpExpr, I just didn't like the idea.
Instead, I think it'll be better just to add a new field to
ScalarArrayOpExpr and have the planner set that to tell the executor
that it should use a hash table to perform the lookups rather than a
linear search. This can just be the hash function oid, which also
saves the executor from having to look that up.
After quite a bit of hacking, I've ended up with the attached. I
added the required JIT code to teach the jit code about
EEOP_HASHED_SCALARARRAYOP.
I also wrote the missing regression test for non-strict equality ops
and moved the declaration for the simplehash.h code into
execExprInterp.c and forward declared ScalarArrayOpExprHashTable in
exprExpr.h. I also rewrote a large number of comments and fixed a few
things like missing permission checks for the hash function.
I've not done any further performance tests yet but will start those now.
David
Attachment | Content-Type | Size |
---|---|---|
saop_hash_v6.patch | application/octet-stream | 33.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-04-08 06:58:02 | Re: SQL-standard function body |
Previous Message | Kohei KaiGai | 2021-04-08 06:48:15 | Re: TRUNCATE on foreign table |