Re: pgsql: Add parallel-aware hash joins.

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-30 21:59:26
Message-ID: CAEepm=02+JsOQc+rmxcGBh4OHyMrzmWHA3ZOMiiBta_2LT0JXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Sun, Dec 31, 2017 at 5:16 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size
>> information. Refactor so that participants report only on batches they worked
>> on rather than trying to report on all of them, and teach explain.c to
>> consider the HashInstrumentation object from all participants instead of
>> picking the first one it can find. This should fix an occasional build farm
>> failure in the "join" regression test.
>
> This seems buggy independent of whether it fixes the issue on prairedog,
> right? So I'm inclined to go ahead and just fix it...

+1

>> + /*
>> + * Merge results from workers. In the parallel-oblivious case, the
>> + * results from all participants should be identical, except where
>> + * participants didn't run the join at all so have no data. In the
>> + * parallel-aware case, we need to aggregate the results. Each worker may
>> + * have seen a different subset of batches and we want to report the peak
>> + * memory usage across all batches.
>> + */
>
> It's not necessarily the peak though, right? The largest batches might
> not be read in at the same time. I'm fine with approximating it as such,
> just want to make sure I understand.

Yeah, it's not attempting to report the true simultaneous peak memory
usage. It's only reporting the largest individual hash table ever
loaded. In a multi-batch join more than one hash table may be loaded
at the same time -- up to the number of participants -- but I'm not
yet attempting to reflect that. On the one hand, that's a bit like
the way we show the size for parallel-oblivious hash joins: each
participant used the reported amount of memory at approximately the
same time. On the other hand, the total simultaneous memory usage for
parallel-aware hash join is capped by both nbatch and nparticipants:
the true simultaneous peak must be <= largest_hash_table * Min(nbatch,
nparticipants). I considered various ways to capture and report this
(see the 0007 patch in the v26 patchset, which showed per-worker
information separately, but I abandoned that patch because it was
useless and confusing; another idea would be to report the sum of the
nparticipants largest hash tables, or just assume all batches are
similarly sized and use the formula I gave above, and another would be
to actually track which hash tables or memory regions that were
simultaneously loaded with an incremental shared counter maintained
when hash chunks and bucket arrays are allocated and freed), but
figured we should just go with something super simple for now and then
discuss better ideas as a later evolution.

>> [code]
>
> I bet pgindent will not like this layout.

pgindented.

> Ho hum. Is this really, as you say above, an "aggregate the results"?

Yeah, misleading/stupid use of "aggregate" (SQL MAX() is an
aggregate...). Offending word removed.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
fix-phj-explain-v2.patch application/octet-stream 8.4 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-12-30 22:34:17 Re: pgsql: Add parallel-aware hash joins.
Previous Message Andres Freund 2017-12-30 16:16:34 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-30 22:34:17 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tom Lane 2017-12-30 21:46:41 Better testing coverage and unified coding for plpgsql loops