Re: pgsql: Add parallel-aware hash joins.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2018-01-24 18:20:18
Message-ID: 20180124182018.jx2h4wp6twqo7xix@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2018-01-23 14:24:56 -0500, Robert Haas wrote:
> Right, but this doesn't seem to show any big spike in the runtime at
> the time when parallel hash was committed, or when the preparatory
> patch to add test coverage for hash joins got committed. Rather,
> there's a gradual increase over time. Either we're making the server
> slower (which would be bad) or we're adding proper test coverage for
> all the new features that we're adding (which would be good). We
> can't expect every feature patch to preserve the runtime of the tests
> absolutely unchanged; figuring out what can be optimized is a separate
> exercise from adding test coverage either for new things or for things
> that weren't previously covered.

Agreed.

One the improvement front, my observation is that we rarely are actually
cpu bound across processes. One thing I've been wondering is whether we
can get a pretty large win from just rescheduling
parallel_schedule. There definitely are individual testfiles that take a
lot longer than others, but their positining in groups doesn't
necessarily reflect that.

Besides manually reordering the schedule, I think it might be time that
we improve pg_regress's scheduling. One big first step would e.g. be to
not manually limit the number of parallel tests in a group to 20, but
instead allow larger groups and only run a limited number of them in
parallel. If done right we could start the next test in a group as soon
as *one* task in a group has finished, rather than waiting for all of
them to finish as we currently do for (sub-)groups.

Besides larger groups, starting the next test(s) earlier, another way to
gain pretty large improvements would be a test schedule feature that
allowed to stat dependencies between tests. So instead of manually
grouping the schedule, have 'numerology' state that it depends on int2,
int4, int8, float4, float8, which means it can actually be started
earlier than it currently can in many cases.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2018-01-24 18:20:48 pgsql: doc: clarify use of RegisterDynamicBackgroundWorker
Previous Message Robert Haas 2018-01-24 18:11:22 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-01-24 18:20:49 Re: Would a BGW need shmem_access or database_connection to enumerate databases?
Previous Message Robert Haas 2018-01-24 18:19:19 Re: copy.c allocation constant