Re: pgsql: Add parallel-aware hash joins.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-30 16:16:34
Message-ID: 20171230161634.3lpvgerqsysricaa@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2017-12-31 02:51:26 +1300, Thomas Munro wrote:
> You mentioned that prairiedog sees the problem about one time in
> thirty. Would you mind checking if it goes away with this patch
> applied?
>
> --
> Thomas Munro
> http://www.enterprisedb.com

> From cbed027275039cc5debf8db89342a133a831c9ca Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
> Date: Sun, 31 Dec 2017 02:03:07 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE output for Parallel Hash.
>
> In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size
> information. Refactor so that participants report only on batches they worked
> on rather than trying to report on all of them, and teach explain.c to
> consider the HashInstrumentation object from all participants instead of
> picking the first one it can find. This should fix an occasional build farm
> failure in the "join" regression test.

This seems buggy independent of whether it fixes the issue on prairedog,
right? So I'm inclined to go ahead and just fix it...

> + /*
> + * Merge results from workers. In the parallel-oblivious case, the
> + * results from all participants should be identical, except where
> + * participants didn't run the join at all so have no data. In the
> + * parallel-aware case, we need to aggregate the results. Each worker may
> + * have seen a different subset of batches and we want to report the peak
> + * memory usage across all batches.
> + */

It's not necessarily the peak though, right? The largest batches might
not be read in at the same time. I'm fine with approximating it as such,
just want to make sure I understand.

> + if (hashstate->shared_info)
> {
> SharedHashInfo *shared_info = hashstate->shared_info;
> int i;
>
> - /* Find the first worker that built a hash table. */
> for (i = 0; i < shared_info->num_workers; ++i)
> {
> - if (shared_info->hinstrument[i].nbatch > 0)
> + HashInstrumentation *worker_hi = &shared_info->hinstrument[i];
> +
> + if (worker_hi->nbatch > 0)
> {
> - hinstrument = &shared_info->hinstrument[i];
> - break;
> + /*
> + * Every participant should agree on the buckets, so to be
> + * sure we have a value we'll just overwrite each time.
> + */
> + hinstrument.nbuckets = worker_hi->nbuckets;
> + hinstrument.nbuckets_original = worker_hi->nbuckets_original;
> + /*
> + * Normally every participant should agree on the number of
> + * batches too, but it's possible for a backend that started
> + * late and missed the whole join not to have the final nbatch
> + * number. So we'll take the largest number.
> + */
> + hinstrument.nbatch = Max(hinstrument.nbatch, worker_hi->nbatch);
> + hinstrument.nbatch_original = worker_hi->nbatch_original;
> + /*
> + * In a parallel-aware hash join, for now we report the
> + * maximum peak memory reported by any worker.
> + */
> + hinstrument.space_peak =
> + Max(hinstrument.space_peak, worker_hi->space_peak);

I bet pgindent will not like this layout.

Ho hum. Is this really, as you say above, an "aggregate the results"?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2017-12-30 21:59:26 Re: pgsql: Add parallel-aware hash joins.
Previous Message Thomas Munro 2017-12-30 13:51:26 Re: pgsql: Add parallel-aware hash joins.

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-12-30 19:35:27 Re: TAP test module - PostgresClient
Previous Message Tom Lane 2017-12-30 15:45:19 Re: TAP test module - PostgresClient