Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2021-04-07 17:54:11
Message-ID: 26bb3b68-3c51-2dac-d6ce-ae321e6bc801@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 4/6/21 5:58 AM, David Rowley wrote:
> On Sat, 20 Mar 2021 at 09:41, James Coleman <jtc331(at)gmail(dot)com> wrote:
>> I've attached a cleaned up patch. Last CF it was listed in is
>> https://commitfest.postgresql.org/29/2542/ -- what's the appropriate
>> step to take here given it's an already existing patch, but not yet
>> moved into recent CFs?
>
> I had a look at this patch. I like the idea of using a simplehash.h
> hash table to hash the constant values so that repeat lookups can be
> performed much more quickly, however, I'm a bit concerned that there
> are quite a few places in the code where we often just execute a
> ScalarArrayOpExpr once and I'm a bit worried that we'll slow down
> expression evaluation of those cases.
>
> The two cases that I have in mind are:
>
> 1. eval_const_expressions() where we use the executor to evaluate the
> ScalarArrayOpExpr to see if the result is Const.
> 2. CHECK constraints with IN clauses and single-row INSERTs.
>
> I tried to benchmark both of these but I'm struggling to get stable
> enough performance for #2, even with fsync=off. Sometimes I'm getting
> results 2.5x slower than other runs.
>
> For benchmarking #1 I'm also not too sure I'm getting stable enough
> results for them to mean anything.
>
> I was running:
>
> create table a (a int);
>
> bench.sql: explain select * from a where a
> in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
>
> drowley(at)amd3990x:~$ pgbench -n -T 60 -j 64 -c 64 -f bench.sql -P 10 postgres
> Master (6d41dd045):
> tps = 992586.991045 (without initial connection time)
> tps = 987964.990483 (without initial connection time)
> tps = 994309.670918 (without initial connection time)
>
> Master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> tps = 956022.557626 (without initial connection time)
> tps = 963043.352823 (without initial connection time)
> tps = 968582.600100 (without initial connection time)
>
> This puts the patched version about 3% slower. I'm not sure how much
> of that is changes in the binary and noise and how much is the
> needless hashtable build done for eval_const_expressions().
>
> I wondered if we should make it the query planner's job of deciding if
> the ScalarArrayOpExpr should be hashed or not. I ended up with the
> attached rough-cut patch that introduces HashedScalarArrayOpExpr and
> has the query planner decide if it's going to replace
> ScalarArrayOpExpr with these HashedScalarArrayOpExpr during
> preprocess_expression(). I do think that we might want to consider
> being a bit selective about when we do these replacements. It seems
> likely that we'd want to do this for EXPRKIND_QUAL and maybe
> EXPRKIND_TARGET, but I imagine that converting ScalarArrayOpExpr to
> HashedScalarArrayOpExpr for EXPRKIND_VALUES would be a waste of time
> since those will just be executed once.
>
> I tried the same above test with the
> v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch plus the
> attached rough-cut patch and got:
>
> master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
> + v5-0002-Rough-cut-patch-for-HashedScalarArrayOpExpr.patch
> tps = 1167969.983173 (without initial connection time)
> tps = 1199636.793314 (without initial connection time)
> tps = 1190690.939963 (without initial connection time)
>
> I can't really explain why this became faster. I was expecting it just
> to reduce that slowdown of the v4 patch a little. I don't really see
> any reason why it would become faster. It's almost 20% faster which
> seems like too much to just be fluctuations in code alignment in the
> binary.
>

Interesting. I tried this on the "small" machine I use for benchmarking,
with the same SQL script you used, and also with IN() containing 10 and
100 values - so less/more than your script, which used 16 values.

I only ran that with a single client, the machine only has 4 cores and
this should not be related to concurrency, so 1 client seems fine. The
average of 10 runs, 15 seconds each look like this:

simple prepared 10/s 10/p 100/s 100/p
-------------------------------------------------------------
master 21847 59476 23343 59380 11757 56488
v4 21546 57757 22864 57704 11572 57350
v4+v5 23374 56089 24410 56140 14765 55302

The first two columns are your bench.sql, with -M simple or prepared.
The other columns are 10 or 100 values, /s is simple, /p is prepared.

Compared to master:

simple prepared 10/s 10/p 100/s 100/p
-------------------------------------------------------------
v4 98.62% 97.11% 97.95% 97.18% 98.43% 101.52%
v4+v5 106.99% 94.31% 104.57% 94.54% 125.59% 97.90%

That seems to mostly match your observation - there's a small
performance hit (~2%), although that might be due to changes in the
layout of the binary. And v4+v5 improves that a bit (even compared to
master), although I don't see the same 20% speedup.

I see +25% improvement, but only with 100 values.

It's a bit strange that in prepared mode, the v5 actually hurts the
performance a bit.

That being said, this is a pretty extreme test case. I'm pretty sure
that once the table is not empty, the results will probably show a clear
improvement. I'll collect some of those results.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-07 18:02:16 Re: ModifyTable overheads in generic plans
Previous Message Mark Dilger 2021-04-07 17:50:26 Re: multi-install PostgresNode fails with older postgres versions