Re: pgsql: Add parallel-aware hash joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-30 22:34:17
Message-ID: 30655.1514673257@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> This is explained by the early exit case in
>> ExecParallelHashEnsureBatchAccessors(). With just the right timing,
>> it finishes up not reporting the true nbatch number, and never calling
>> ExecParallelHashUpdateSpacePeak().

> Hi Tom,

> You mentioned that prairiedog sees the problem about one time in
> thirty. Would you mind checking if it goes away with this patch
> applied?

I've run 55 cycles of "make installcheck" without seeing a failure
with this patch installed. That's not enough to be totally sure
of course, but I think this probably fixes it.

However ... I noticed that my other dinosaur gaur shows the other failure
mode we see in the buildfarm, the "increased_batches = t" diff, and
I can report that this patch does *not* help that. The underlying
EXPLAIN output goes from something like

! Finalize Aggregate (cost=823.85..823.86 rows=1 width=8) (actual time=1378.102..1378.105 rows=1 loops=1)
! -> Gather (cost=823.63..823.84 rows=2 width=8) (actual time=1377.909..1378.006 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=823.63..823.64 rows=1 width=8) (actual time=1280.298..1280.302 rows=1 loops=3)
! -> Parallel Hash Join (cost=387.50..802.80 rows=8333 width=0) (actual time=1070.179..1249.142 rows=6667 loops=3)
! Hash Cond: (r.id = s.id)
! -> Parallel Seq Scan on simple r (cost=0.00..250.33 rows=8333 width=4) (actual time=0.173..62.063 rows=6667 loops=3)
! -> Parallel Hash (cost=250.33..250.33 rows=8333 width=4) (actual time=454.305..454.305 rows=6667 loops=3)
! Buckets: 4096 Batches: 8 Memory Usage: 208kB
! -> Parallel Seq Scan on simple s (cost=0.00..250.33 rows=8333 width=4) (actual time=0.178..67.115 rows=6667 loops=3)
! Planning time: 1.861 ms
! Execution time: 1687.311 ms

to something like

! Finalize Aggregate (cost=823.85..823.86 rows=1 width=8) (actual time=1588.733..1588.737 rows=1 loops=1)
! -> Gather (cost=823.63..823.84 rows=2 width=8) (actual time=1588.529..1588.634 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=823.63..823.64 rows=1 width=8) (actual time=1492.631..1492.635 rows=1 loops=3)
! -> Parallel Hash Join (cost=387.50..802.80 rows=8333 width=0) (actual time=1270.309..1451.501 rows=6667 loops=3)
! Hash Cond: (r.id = s.id)
! -> Parallel Seq Scan on simple r (cost=0.00..250.33 rows=8333 width=4) (actual time=0.219..158.144 rows=6667 loops=3)
! -> Parallel Hash (cost=250.33..250.33 rows=8333 width=4) (actual time=634.614..634.614 rows=6667 loops=3)
! Buckets: 4096 (originally 4096) Batches: 16 (originally 8) Memory Usage: 176kB
! -> Parallel Seq Scan on simple s (cost=0.00..250.33 rows=8333 width=4) (actual time=0.182..120.074 rows=6667 loops=3)
! Planning time: 1.931 ms
! Execution time: 2219.417 ms

so again we have a case where the plan didn't change but the execution
behavior did. This isn't quite 100% reproducible on gaur/pademelon,
but it fails more often than not seems like, so I can poke into it
if you can say what info would be helpful.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2017-12-30 23:28:58 Re: pgsql: Add parallel-aware hash joins.
Previous Message Thomas Munro 2017-12-30 21:59:26 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-30 23:28:58 Re: pgsql: Add parallel-aware hash joins.
Previous Message Thomas Munro 2017-12-30 21:59:26 Re: pgsql: Add parallel-aware hash joins.