Re: Parallel Append implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-10-12 18:59:31
Message-ID: CA+TgmoaHVinxG=3h6qBAsyV8xaDyQwbzK7YZnYfE8nJFMK1=FA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 11, 2017 at 8:51 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> [ new patch ]

+ <entry><literal>parallel_append</></entry>
+ <entry>Waiting to choose the next subplan during Parallel Append plan
+ execution.</entry>
+ </row>
+ <row>

Probably needs to update a morerows values of some earlier entry.

+ <primary><varname>enable_parallelappend</> configuration
parameter</primary>

How about enable_parallel_append?

+ * pa_finished : workers currently executing the subplan. A worker which

The way the colon is used here is not a standard comment style for PostgreSQL.

+ * Go on to the "next" subplan. If no more subplans, return the empty
+ * slot set up for us by ExecInitAppend.
+ * Note: Parallel-aware Append follows different logic for choosing the
+ * next subplan.

Formatting looks wrong, and moreover I don't think this is the right
way of handling this comment anyway. Move the existing comment inside
the if (!node->padesc) block and leave it unchanged; the else block
explains the differences for parallel append.

+ * ExecAppendEstimate
+ *
+ * estimates the space required to serialize Append node.

Ugh, this is wrong, but I notice it follows various other
equally-wrong comments for other parallel-aware node types. I guess
I'll go fix that. We are not in serializing the Append node.

I do not think that it's a good idea to call
exec_append_parallel_next() from ExecAppendInitializeDSM,
ExecAppendReInitializeDSM, and ExecAppendInitializeWorker. We want to
postpone selecting which plan to run until we're actually ready to run
that plan. Otherwise, for example, the leader might seize a
non-partial plan (if only such plans are included in the Parallel
Append) when it isn't really necessary for it to do so. If the
workers would've reached the plans and started returning tuples to the
leader before it grabbed a plan, oh well, too bad. The leader's still
claimed that plan and must now run it.

I concede that's not a high-probability scenario, but I still maintain
that it is better for processes not to claim a subplan until the last
possible moment. I think we need to initialize as_whichplan as
PA_INVALID plan and then fix it when ExecProcNode() is called for the
first time.

+ if (!IsParallelWorker())

This is not a great test, because it would do the wrong thing if we
ever allowed an SQL function called from a parallel worker to run a
parallel query of its own. Currently that's not allowed but we might
want to allow it someday. What we really want to test is whether
we're the leader for *this* query. Maybe use a flag in the
AppendState for that, and set it correctly in
ExecAppendInitializeWorker.

I think maybe the loop in exec_append_parallel_next should look more like this:

/* Pick the next plan. */
state->as_whichplan = padesc->pa_nextplan;
if (state->as_whichplan != PA_INVALID_PLAN)
{
int nextplan = state->as_whichplan;

/* Mark non-partial plans done immediately so that they can't be
picked again. */
if (nextplan < first_partial_plan)
padesc->pa_finished[nextplan] = true;

/* Figure out what plan the next worker should pick. */
do
{
/* If we've run through all the plans, loop back through
partial plans only. */
if (++nextplan >= state->as_nplans)
nextplan = first_partial_plan;

/* No plans remaining or tried them all? Then give up. */
if (nextplan == state->as_whichplan || nextplan >= state->as_nplans)
{
nextplan = PA_INVALID_PLAN;
break;
}
} while (padesc->pa_finished[nextplan]);

/* Store calculated next plan back into shared memory. */
padesc->pa_next_plan = nextplan;
}

This might not be exactly right and the comments may need work, but
here are a couple of points:

- As you have it coded, the loop exit condition is whichplan !=
PA_INVALID_PLAN, but that's probably an uncommon case and you have two
other ways out of the loop. It's easier to understand the code if the
loop condition corresponds to the most common way of exiting the loop,
and any break is for some corner case.

- Don't need a separate call to exec_append_get_next_plan; it's all
handled here (and, I think, pretty compactly).

- No need for pa_first_plan any more. Looping back to
first_partial_plan is a fine substitute, because by the time anybody
loops around, pa_first_plan would equal first_partial_plan anyway
(unless there's a bug).

- In your code, the value in shared memory is the point at which to
start the search for the next plan. Here, I made it the value that
the next worker should adopt without question. Another option would
be to make it the value that the last worker adopted. I think that
either that option or what I did above are slightly better than what
you have, because as you have it, you've got to use the
increment-with-looping logic in two different places whereas either of
those options only need it in one place, which makes this a little
simpler.

None of this is really a big deal I suppose but I find the logic here
rather sprawling right now and I think we should try to tighten it up
as much as possible.

I only looked over the executor changes on this pass, not the planner stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gourav Kumar 2017-10-12 18:59:51 Re: How does postgres store the join predicate for a relation in a given query
Previous Message Andres Freund 2017-10-12 18:03:34 Re: pgsql: Add configure infrastructure to detect support for C99's restric