Re: pgsql: Add parallel-aware hash joins.

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-30 23:28:58
Message-ID: CAEepm=3SeFvsfnnOLSA3tLtBe-rtyL=c+vfzyPCsViBjk521qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Sun, Dec 31, 2017 at 11:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> You mentioned that prairiedog sees the problem about one time in
>> thirty. Would you mind checking if it goes away with this patch
>> applied?
>
> I've run 55 cycles of "make installcheck" without seeing a failure
> with this patch installed. That's not enough to be totally sure
> of course, but I think this probably fixes it.

Thanks!

> However ... I noticed that my other dinosaur gaur shows the other failure
> mode we see in the buildfarm, the "increased_batches = t" diff, and
> I can report that this patch does *not* help that. The underlying
> EXPLAIN output goes from something like
>
> ! Buckets: 4096 Batches: 8 Memory Usage: 208kB
>
> to something like
>
> ! Buckets: 4096 (originally 4096) Batches: 16 (originally 8) Memory Usage: 176kB
>
> so again we have a case where the plan didn't change but the execution
> behavior did. This isn't quite 100% reproducible on gaur/pademelon,
> but it fails more often than not seems like, so I can poke into it
> if you can say what info would be helpful.

Right. That's apparently unrelated and is the last build-farm issue
on my list (so far). I had noticed that certain BF animals are prone
to that particular failure, and they mostly have architectures that I
don't have so a few things are probably just differently sized. At
first I thought I'd tweak the tests so that the parameters were always
stable, and I got as far as installing Debian on qemu-system-ppc (it
took a looong time to compile PostgreSQL), but that seems a bit cheap
and flimsy... better to fix the size estimation error.

I assume that what happens here is the planner's size estimation code
sometimes disagrees with Parallel Hash's chunk-based memory
accounting, even though in this case we had perfect tuple count and
tuple size information. In an earlier version of the patch set I
refactored the planner to be chunk-aware (even for parallel-oblivious
hash join), but later in the process I tried to simplify and shrink
the patch set and avoid making unnecessary changes to non-Parallel
Hash code paths. I think I'll need to make the planner aware of the
maximum amount of fragmentation possible when parallel-aware
(something like: up to one tuple's worth at the end of each chunk, and
up to one whole wasted chunk per participating backend). More soon.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-12-31 00:00:19 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tom Lane 2017-12-30 22:34:17 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-31 00:00:19 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tom Lane 2017-12-30 22:34:17 Re: pgsql: Add parallel-aware hash joins.