Re: pg13dev: explain partial, parallel hashagg, and memory use

From: James Coleman <jtc331(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <jdavis(at)postgresql(dot)org>
Subject: Re: pg13dev: explain partial, parallel hashagg, and memory use
Date: 2020-08-05 02:01:18
Message-ID: CAAaqYe-sg7cHgayWwKWZtSyFr5LQEiMExiqmjeHUOKXxHKxWjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2020 at 9:44 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Wed, 5 Aug 2020 at 13:21, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > I'm testing with a customer's data on pg13dev and got output for which Peak
> > Memory doesn't look right/useful. I reproduced it on 565f16902.
>
> Likely the sanity of those results depends on whether you think that
> the Memory Usage reported outside of the workers is meant to be the
> sum of all processes or the memory usage for the leader backend.
>
> All that's going on here is that the Parallel Append is using some
> parallel safe paths and giving one to each worker. The 2 workers take
> the first 2 subpaths and the leader takes the third. The memory usage
> reported helps confirm that's the case.
>
> Can you explain what you'd want to see changed about this? Or do you
> want to see the non-parallel worker memory be the sum of all workers?
> Sort does not seem to do that, so I'm not sure if we should consider
> hash agg as an exception to that.

I've always found the way we report parallel workers in EXPLAIN quite
confusing. I realize it matches the actual implementation model (the
leader often is also "another worker", but I think the natural
expectation from a user perspective would be that you'd show as
workers all backends (including the leader) that did work, and then
aggregate into a summary line (where the leader is displayed now).

In the current output there's nothing really to hint to the use that
the model is leader + workers and that the "summary" line is really
the leader. If I were to design this from scratch, I'd want to propose
doing what I said above (summary aggregate line + treat leader as a
worker line, likely with a "leader" tag), but that seems like a big
change to make now. On the other hand, perhaps designating what looks
like a summary line as the "leader" or some such would help clear up
the confusion? Perhaps it could also say "Participating" or
"Non-participating"?

James

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-05 02:11:09 Re: [DOC] Document concurrent index builds waiting on each other
Previous Message David Rowley 2020-08-05 01:44:17 Re: pg13dev: explain partial, parallel hashagg, and memory use