Re: pgsql: Add parallel-aware hash joins.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-24 18:11:22
Message-ID: CA+TgmoYvHMv-9TKvPAotiYhv_Opj6HSJvkvQ6ReeYsOJceZSqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Jan 23, 2018 at 6:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Looking more closely at the shorter series, there are four pretty obvious
> step changes since 2016-09. The PNG's x-axis doesn't have enough
> resolution to match these up to commits, but looking at the underlying
> data, they clearly correspond to:
>
> Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
> Improve regression test coverage for hash indexes.
>
> Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
> Speed up hash_index regression test.
>
> Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
> Add some regression tests that exercise hash join code.
>
> Branch: master [180428404] 2017-12-21 00:43:41 -0800
> Add parallel-aware hash joins.
>
> I thought that the hash index test case was excessively expensive for
> what it covered, and I'm now thinking the same about hash joins.

Hmm. I guess I'm insulated from some of the problem here by my choice
of hardware. On my laptop, 'make check' takes between 25.5 and 26
seconds (on 28e04155f17cabda7a18aee31d130aa10e25ee86). If I remove
the hash_index test from parallel_schedule, it still takes between
25.5 and 26 seconds. If I also remove the join test in its entirety,
it drops down to 24-24.5 seconds. If I put hash_index and join back
in the schedule file but revert join.sql and join.out to the version
just before fa330f9ad, it takes about 24.5 seconds. So for me, the
additional hash index tests don't cost anything measurable and the
additional hash join tests cost about a second. I think this probably
accounts for why committers other than you keep "adding so much time
to the regression tests". On modern hardware, the costs just don't
matter. As a further point of reference, on this machine, 9.5 stable
is 24.5-25 seconds, and 9.3 is 25.5-26 seconds, so from here it looks
like in the last 5 years the speed of 'make check' is within a half
second or so of the performance we had 5 years ago even though the
volume of the regression tests in terms of lines of SQL code has
increased by more than 50% in the same time period.

Now, how much should we care about the performance of software with a
planned release date of 2018 on hardware discontinued in 2001,
hardware that is apparently about 20 times slower than a modern
laptop? Some, perhaps, but maybe not a whole lot. Removing tests
that have found actual bugs because they cost runtime on ancient
systems that nobody uses for serious work doesn't make sense to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2018-01-24 18:20:18 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tom Lane 2018-01-24 04:07:25 pgsql: Improve implementation of pg_attribute_always_inline.

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-24 18:19:19 Re: copy.c allocation constant
Previous Message Tom Lane 2018-01-24 18:01:12 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump