Re: pgsql: Add parallel-aware hash joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-31 00:00:19
Message-ID: 4001.1514678419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Sun, Dec 31, 2017 at 11:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... This isn't quite 100% reproducible on gaur/pademelon,
>> but it fails more often than not seems like, so I can poke into it
>> if you can say what info would be helpful.

> Right. That's apparently unrelated and is the last build-farm issue
> on my list (so far). I had noticed that certain BF animals are prone
> to that particular failure, and they mostly have architectures that I
> don't have so a few things are probably just differently sized. At
> first I thought I'd tweak the tests so that the parameters were always
> stable, and I got as far as installing Debian on qemu-system-ppc (it
> took a looong time to compile PostgreSQL), but that seems a bit cheap
> and flimsy... better to fix the size estimation error.

"Size estimation error"? Why do you think it's that? We have exactly
the same plan in both cases.

My guess is that what's happening is that one worker or the other ends
up processing the whole scan, or the vast majority of it, so that that
worker's hash table has to hold substantially more than half of the
tuples and thereby is forced to up the number of batches. I don't see
how you can expect to estimate that situation exactly; or if you do,
you'll be pessimizing the plan for cases where the split is more nearly
equal.

By this theory, the reason why certain BF members are more prone to the
failure is that they're single-processor machines, and perhaps have
kernels with relatively long scheduling quanta, so that it's more likely
that the worker that gets scheduled first is able to read the whole input
to the hash step.

> I assume that what happens here is the planner's size estimation code
> sometimes disagrees with Parallel Hash's chunk-based memory
> accounting, even though in this case we had perfect tuple count and
> tuple size information. In an earlier version of the patch set I
> refactored the planner to be chunk-aware (even for parallel-oblivious
> hash join), but later in the process I tried to simplify and shrink
> the patch set and avoid making unnecessary changes to non-Parallel
> Hash code paths. I think I'll need to make the planner aware of the
> maximum amount of fragmentation possible when parallel-aware
> (something like: up to one tuple's worth at the end of each chunk, and
> up to one whole wasted chunk per participating backend). More soon.

I'm really dubious that trying to model the executor's space consumption
exactly is a good idea, even if it did fix this specific problem.
That would expend extra planner cycles and pose a continuing maintenance
gotcha.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-12-31 22:20:46 pgsql: Merge coding of return/exit/continue cases in plpgsql's loop sta
Previous Message Thomas Munro 2017-12-30 23:28:58 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2017-12-31 05:55:02 Re: [HACKERS] WIP Patch: Precalculate stable functions, infrastructure v1
Previous Message Thomas Munro 2017-12-30 23:28:58 Re: pgsql: Add parallel-aware hash joins.