Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2021-04-08 21:32:13
Message-ID: ab5b8dc0-3270-e13c-859f-463b9b9237c3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/8/21 2:00 PM, David Rowley wrote:
> On Thu, 8 Apr 2021 at 22:54, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> I think the changes in the patch are fairly isolated and the test
>> coverage is now pretty good. I'm planning on looking at the patch
>> again now and will consider pushing it for PG14.
>
> I push this with some minor cleanup from the v6 patch I posted earlier.
>

I ran the same set of benchmarks on the v6, which I think should be
mostly identical to what was committed. I extended the test a bit to
test table with 0, 1, 5 and 1000 rows, and also with int and text
values, to see how it works with more expensive comparators.

I built binaries with gcc 9.2.0 and clang 11.0.0, the full results are
attached. There's a bit of difference between gcc and clang, but the
general behavior is about the same, so I'll only present gcc results to
keep this simple. I'll only throughput comparison to master, so >1.0
means good, <1.0 means bad. If you're interested in actual tps, see the
full results.

For the v5 patch (actually v4-0001 + v5-0002) and v6, the results are:

integer column / v5
===================

rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 97% 97% 97% 107% 126% 108%
1 95% 82% 94% 108% 132% 110%
5 95% 83% 95% 108% 132% 110%
1000 129% 481% 171% 131% 382% 165%

integer column / v6
===================

rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 97% 97% 97% 98% 98% 98%
1 96% 84% 95% 97% 97% 98%
5 97% 85% 96% 98% 97% 97%
1000 129% 489% 172% 128% 330% 162%

text column / v5
================

rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 100% 100% 100% 106% 119% 108%
1 96% 81% 95% 107% 120% 109%
5 97% 82% 96% 107% 121% 109%
1000 291% 1622% 402% 255% 1092% 337%

text column / v6
================

rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 101% 101% 101% 98% 99% 99%
1 98% 82% 96% 98% 96% 97%
5 100% 84% 98% 98% 96% 98%
1000 297% 1645% 408% 255% 1000% 336%

Overall, the behavior for integer and text columns is the same, for both
patches. There's a couple interesting observations:

1) For the "simple" query mode, v5 helped quite a bit (20-30% speedup),
but v6 does not seem to help at all - it's either same or slower than
unpatched master.

I wonder why is that, and if we could get some of the speedup with v6?
At first I thought that maybe v5 is not building the hash table in cases
where v6 does, but that shouldn't be faster than master.

2) For the "prepared" mode, there's a clear performance hit the longer
the array is (for both v5 and v6). For 100 elements it's about 15%,
which is not great.

I think the reason is fairly simple - building the hash table is not
free, and with few rows it's not worth it - it'd be faster to just
search the array directly. Unfortunately, the logic that makes the
decision to switch to hashing only looks at the array length only, and
ignores the number of rows entirely. So I think if we want to address
this, convert_saop_to_hashed_saop needs to compare

has_build_cost + nrows * hash_lookup_cost

and

nrows * linear_lookup_cost

to make reasonable decision.

I was thinking that maybe we can ignore this, because people probably
have much larger tables in practice. But I'm not sure that's really
true, because there may be other quals and it's possible the preceding
ones are quite selective, filtering most of the rows.

I'm not sure how much of the necessary information we have available in
convert_saop_to_hashed_saop_walker, though :-( I suppose we know the
number of input rows for that plan node, not sure about selectivity of
the other quals, though.

It's also a bit strange that we get speedup for "simple" protocol, while
for "prepared" it gets slower. That seems counter-intuitive, because why
should we see opposite outcomes in those cases? I'd assume that we'll
see either speedup or slowdown in both cases, with the relative change
being more significant in the "prepared" mode.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
saop gcc.ods application/vnd.oasis.opendocument.spreadsheet 34.9 KB
saop llvm.ods application/vnd.oasis.opendocument.spreadsheet 26.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-04-08 21:38:13 Re: [PATCH] force_parallel_mode and GUC categories
Previous Message Justin Pryzby 2021-04-08 21:30:51 Re: Autovacuum on partitioned table (autoanalyze)