Re: pgsql: Add parallel-aware hash joins.

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-24 01:41:21
Message-ID: CAEepm=1c7SpPfsFtFS0gJaPYXwLu=BpCi8WX3Ws7DsuVh1KzkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Jan 24, 2018 at 12:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There is a very clear secular trend up in the longer data series,
> which indicates that we're testing more stuff,

+1

> which doesn't bother
> me in itself as long as the time is well spent. However, the trend
> over the last two months is very bad, and I do not think that we can
> point to any large improvement in test coverage that someone committed
> since November.

I'm not sure if coverage.postgresql.org has a way to view historical
reports so we can see the actual percentage change, but as I recall
it, commit fa330f9ad "Add some regression tests that exercise hash
join code." pushed nodeHash.c and possibly nodeHashJoin.c into green
territory on here:

https://coverage.postgresql.org/src/backend/executor/index.html

> Looking more closely at the shorter series, there are four pretty obvious
> step changes since 2016-09. The PNG's x-axis doesn't have enough
> resolution to match these up to commits, but looking at the underlying
> data, they clearly correspond to:
>
> Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
> Improve regression test coverage for hash indexes.
>
> Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
> Speed up hash_index regression test.
>
> Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
> Add some regression tests that exercise hash join code.

Joining check runtimes with the commit log (see attached), I see:

2017-11-30 | fa330f9a | Add some regression tests that exercise | 00:08:30
2017-11-29 | 84940644 | New C function: bms_add_range | 00:08:18

That's +2.4%.

> Branch: master [180428404] 2017-12-21 00:43:41 -0800
> Add parallel-aware hash joins.

2017-12-21 | cce1ecfc | Adjust assertion in GetCurrentCommandId. | 00:09:03
2017-12-21 | 6719b238 | Rearrange execution of PARAM_EXTERN Para |
2017-12-21 | 8a0596cb | Get rid of copy_partition_key |
2017-12-21 | 9ef6aba1 | Fix typo |
2017-12-21 | c98c35cd | Avoid putting build-location-dependent s |
2017-12-21 | 59d1e2b9 | Cancel CV sleep during subtransaction ab |
2017-12-21 | 18042840 | Add parallel-aware hash joins. |
2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45

That's +3.4%. That's a bit more than I expected. I saw 2.5% on my
development box and hoped that'd be OK for a complex feature with a
lot of paths to test.

But hang on a minute -- how did we get to 08:45 from 08:30 between
those commits? Of course this is all noisy data and individual
samples are all over the place, but I think I see some signal here:

2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45
2017-12-19 | 7d3583ad | Test instrumentation of Hash nodes with | 00:08:43
2017-12-19 | 8526bcb2 | Try again to fix accumulation of paralle |
2017-12-19 | 38fc5470 | Re-fix wrong costing of Sort under Gathe | 00:08:31
2017-12-19 | 09a65f5a | Mark a few parallelism-related variables | 00:08:27

Both 8526bcb2 and 7d3583ad add Gather rescans. Doesn't 8526bcb2 add a
query that forks 40 times? I wonder how long it takes to start a
background worker on a Mac Cube. From that commit:

+ -> Gather (actual rows=9800 loops=10)
+ Workers Planned: 4
+ Workers Launched: 4
+ -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)

> I thought that the hash index test case was excessively expensive for
> what it covered, and I'm now thinking the same about hash joins.

Does join-test-shrink.patch (from my earlier message) help much, on
prairiedog? It cut "check" time by ~1.5% on my low end machine.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
prairiedog-check-speed-vs-commit-log.txt text/plain 31.8 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Amit Kapila 2018-01-24 02:56:45 Re: pgsql: Allow UPDATE to move rows between partitions.
Previous Message Bruce Momjian 2018-01-23 23:23:07 pgsql: doc: mention psql -l uses the 'postgres' database by default

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-01-24 01:44:26 Re: Failed to request an autovacuum work-item in silence
Previous Message Masahiko Sawada 2018-01-24 01:39:32 Re: Remove utils/dsa.h from autovacuum.c