Re: Parallel leader process info in EXPLAIN

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel leader process info in EXPLAIN
Date: 2020-01-26 22:49:10
Message-ID: 18323.1580078950@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> I think I'm going to abandon 0002 for now, because that stuff is being
> refactored independently over here, so rebasing would be futile:
> https://www.postgresql.org/message-id/flat/CAOtHd0AvAA8CLB9Xz0wnxu1U%3DzJCKrr1r4QwwXi_kcQsHDVU%3DQ%40mail.gmail.com

Yeah, your 0002 needs some rethinking. I kind of like the proposed
change in the text-format output:

Workers Launched: 4
-> Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
- Sort Method: quicksort Memory: xxx
+ Leader: Sort Method: quicksort Memory: xxx
Worker 0: Sort Method: quicksort Memory: xxx
Worker 1: Sort Method: quicksort Memory: xxx
Worker 2: Sort Method: quicksort Memory: xxx

but it's quite unclear to me how that translates into non-text
formats, especially if we're not to break invariants about which
fields are present in a non-text output structure (cf [1]).

I've occasionally wondered whether we'd be better off presenting
this info as if the leader were "worker 0" and then the N workers
are workers 1 to N. I've not worked out the implications of that
in any detail though. It's fairly easy to see what to do for
fields that can be aggregated (the numbers printed for the node
as a whole are totals), but it doesn't help us any with something
like Sort Method.

On a narrower note, I'm not at all happy with the fact that 0001
adds yet another field to *every* PlanState. I think this is
doubling down on a fundamentally wrong decision to have
ExecParallelRetrieveInstrumentation do some aggregation immediately.
I think we should abandon that and just say that it returns the raw
leader and per-worker data, and then explain.c can aggregate as it
wishes.

regards, tom lane

[1] https://www.postgresql.org/message-id/19416.1580069629%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-26 22:53:09 Re: EXPLAIN's handling of output-a-field-or-not decisions
Previous Message Peter Geoghegan 2020-01-26 22:49:06 Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()