Re: pgsql: Add parallel-aware hash joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-23 23:10:56
Message-ID: 12736.1516749056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jan 22, 2018 at 6:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Here's a possibly more useful graph of regression test timings over
>> the last year. I pulled this from the buildfarm database: it is the
>> reported runtime for the "installcheck-C" step in each successful
>> build of HEAD on dromedary, going back to Jan. 2017.

> Right, but this doesn't seem to show any big spike in the runtime at
> the time when parallel hash was committed, or when the preparatory
> patch to add test coverage for hash joins got committed. Rather,
> there's a gradual increase over time.

Well, there's just too much noise in this chart. Let's try another
machine: prairiedog, which is a lot slower so that the 1s resolution
isn't such a limiting factor, and it's also one that I know there hasn't
been much of any system change in.

The first attached PNG shows the "installcheck-C" runtime for as far
back as the buildfarm database has the data, and the second zooms in
on events since late 2016. As before, I've dropped individual outlier
results (those significantly slower than any nearby run) on the grounds
that they probably represent interference from nightly backups. I also
attached the raw data (including outliers) in case anyone wants to do
their own analysis.

There is a very clear secular trend up in the longer data series,
which indicates that we're testing more stuff, which doesn't bother
me in itself as long as the time is well spent. However, the trend
over the last two months is very bad, and I do not think that we can
point to any large improvement in test coverage that someone committed
since November.

Looking more closely at the shorter series, there are four pretty obvious
step changes since 2016-09. The PNG's x-axis doesn't have enough
resolution to match these up to commits, but looking at the underlying
data, they clearly correspond to:

Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
Improve regression test coverage for hash indexes.

Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
Speed up hash_index regression test.

Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
Add some regression tests that exercise hash join code.

Branch: master [180428404] 2017-12-21 00:43:41 -0800
Add parallel-aware hash joins.

I thought that the hash index test case was excessively expensive for
what it covered, and I'm now thinking the same about hash joins.

regards, tom lane

Attachment Content-Type Size
image/png 4.6 KB
image/png 4.9 KB
pdog-raw.txt text/plain 126.2 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2018-01-23 23:23:07 pgsql: doc: mention psql -l uses the 'postgres' database by default
Previous Message Tom Lane 2018-01-23 21:50:52 pgsql: Teach reparameterize_path() to handle AppendPaths.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-23 23:31:58 Re: [HACKERS] Planning counters in pg_stat_statements
Previous Message Thomas Munro 2018-01-23 22:52:54 Re: [HACKERS] Planning counters in pg_stat_statements