Re: pgsql: Add parallel-aware hash joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-24 19:31:47
Message-ID: 14488.1516822307@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-01-24 13:11:22 -0500, Robert Haas wrote:
>> Now, how much should we care about the performance of software with a
>> planned release date of 2018 on hardware discontinued in 2001,
>> hardware that is apparently about 20 times slower than a modern
>> laptop? Some, perhaps, but maybe not a whole lot. Removing tests
>> that have found actual bugs because they cost runtime on ancient
>> systems that nobody uses for serious work doesn't make sense to me.

> I again agree with the sentiment.

I find that to be a completely bogus straw-man argument. The point of
looking at the prairiedog time series is just to see a data series in
which the noise level is small enough to discern the signal. If anyone's
got years worth of data off a more modern machine, and they can extract
a signal from that, by all means let's consider that data instead. But
there's no clear argument (or at least you have not made one) that says
that prairiedog's relative timings don't match what we'd get on more
modern machines.

Now, what *is* a relevant objection is that most of us care more about
the runtime of the parallelized regression tests than serial tests.
I did not use prairiedog's "make check" timings in these graphs because
that would include "make install" and initdb timings, adding noise and
overhead and probably making it harder to see what's going on. But
it's perfectly fair to want to optimize that case not the serial case.

However ... if you spend any time looking at the behavior of that,
the hashjoin tests are still problematic. I instrumented the parallel
tests by turning on log_disconnections so as to get per-test-script
timings, and what I find to be the slowest steps on my development
workstation are

[pg_regress/rowsecurity] 0.517
[pg_regress/partition_join] 0.535
[pg_regress/updatable_views] 0.546
[pg_regress/stats] 0.566
[pg_regress/triggers] 0.618
[pg_regress/foreign_data] 0.642
[pg_regress/stats_ext] 0.670
[pg_regress/select_parallel] 0.828
[pg_regress/create_index] 0.916
[pg_regress/alter_table] 1.187
[pg_regress/gist] 1.283
[pg_regress/join] 1.923
[pg_regress/plpgsql] 3.100

(The overall runtime for "make installcheck-parallel" on this machine
is about 17.3 seconds right now.) The next slowest test script in
the join test's group is "update", at 0.373 seconds; so over 1.5 sec
of the total 17.3 sec runtime is being spent solely in the join script.

Running the same test on the v10 branch, the slowest steps are

[pg_regress/join] 0.521
[pg_regress/rowsecurity] 0.521
[pg_regress/updatable_views] 0.554
[pg_regress/triggers] 0.624
[pg_regress/foreign_data] 0.647
[pg_regress/stats_ext] 0.675
[pg_regress/select_parallel] 0.690
[pg_regress/create_index] 0.928
[pg_regress/gist] 1.020
[pg_regress/alter_table] 1.120
[pg_regress/plpgsql] 3.217

so join has gotten about 1 second slower since v10, and that time is
coming entirely out of developers' hides despite parallelism because
it was already the slowest in its group.

So I continue to maintain that an unreasonable fraction of the total
resources devoted to the regular regression tests is going into these
new hashjoin tests.

Based on these numbers, it seems like one easy thing we could do to
reduce parallel check time is to split the plpgsql test into several
scripts that could run in parallel. But independently of that,
I think we need to make an effort to push hashjoin's time back down.

> One caveat is that old machines also
> somewhat approximate testing with more instrumentation / debugging
> enabled (say valgrind, CLOBBER_CACHE_ALWAYS, etc). So removing excessive
> test overhead has still quite some benefits. But I definitely do not
> want to lower coverage to achieve it.

I don't want to lower coverage either. I do want some effort to be
spent on achieving test coverage intelligently, rather than just throwing
large test cases at the code without consideration of the costs.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2018-01-24 19:57:04 Re: pgsql: Add parallel-aware hash joins.
Previous Message Peter Eisentraut 2018-01-24 18:48:12 pgsql: Add tests for record_image_eq and record_image_cmp

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-24 19:34:07 Re: pgindent run?
Previous Message Stephen Frost 2018-01-24 19:31:31 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump