Re: Asynchronous Append on postgres_fdw nodes.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: etsuro(dot)fujita(at)gmail(dot)com
Cc: pryzby(at)telsasoft(dot)com, a(dot)lepikhov(at)postgrespro(dot)ru, movead(dot)li(at)highgo(dot)ca, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2020-12-14 08:56:23
Message-ID: 20201214.175623.1287257830379316172.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sat, 12 Dec 2020 19:06:51 +0900, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote in
> On Fri, Nov 20, 2020 at 8:16 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > I looked through the nodeAppend.c and postgres_fdw.c part and those
> > are I think the core of this patch.
>
> Thanks again for the review!
>
> > - * figure out which subplan we are currently processing
> > + * try to get a tuple from async subplans
> > + */
> > + if (!bms_is_empty(node->as_needrequest) ||
> > + (node->as_syncdone && !bms_is_empty(node->as_asyncpending)))
> > + {
> > + if (ExecAppendAsyncGetNext(node, &result))
> > + return result;
> >
> > The function ExecAppendAsyncGetNext() is a function called only here,
> > and contains only 31 lines. It doesn't seem to me that the separation
> > makes the code more readable.
>
> Considering the original ExecAppend() is about 50 lines long, 31 lines
> of code would not be small. So I'd vote for separating it into
> another function as proposed.

Ok, I no longer oppose to separating some part from ExecAppend().

> > - /* choose new subplan; if none, we're done */
> > - if (!node->choose_next_subplan(node))
> > + /* wait or poll async events */
> > + if (!bms_is_empty(node->as_asyncpending))
> > + {
> > + Assert(!node->as_syncdone);
> > + Assert(bms_is_empty(node->as_needrequest));
> > + ExecAppendAsyncEventWait(node);
> >
> > You moved the function to wait for events from execAsync to
> > nodeAppend. The former is a generic module that can be used from any
> > kind of executor nodes, but the latter is specialized for nodeAppend.
> > In other words, the abstraction level is lowered here. What is the
> > reason for the change?
>
> The reason is just because that function is only called from
> ExecAppend(). I put some functions only called from nodeAppend.c in
> execAsync.c, though.

(I think) You told me that you preferred the genericity of the
original interface, but you're doing the opposite. If you think we
can move such a generic feature to a part of Append node, all other
features can be move the same way. I guess there's a reason you want
only the this specific feature out of all of them be Append-spcific
and I want to know that.

> > + /* Perform the actual callback. */
> > + ExecAsyncRequest(areq);
> > + if (ExecAppendAsyncResponse(areq))
> > + {
> > + Assert(!TupIsNull(areq->result));
> > + *result = areq->result;
> >
> > Putting aside the name of the functions, the first two function are
> > used only this way at only two places. ExecAsyncRequest(areq) tells
> > fdw to store the first tuple among the already received ones to areq,
> > and ExecAppendAsyncResponse(areq) is checking the result is actually
> > set. Finally the result is retrieved directory from areq->result.
> > What is the reason that the two functions are separately exists?
>
> I think that when an async-aware node gets a tuple from an
> async-capable node, they should use ExecAsyncRequest() /
> ExecAyncHogeResponse() rather than ExecProcNode() [1]. I modified the
> patch so that ExecAppendAsyncResponse() is called from Append, but to
> support bubbling up the plan tree discussed in [2], I think it should
> be called from ForeignScans (the sides of async-capable nodes). Am I
> right? Anyway, I’ll rename ExecAppendAyncResponse() to the one
> proposed in Robert’s original patch.

Even though I understand the concept but to make work it we need to
remember the parent *async* node somewhere. In my faint memory the
very early patch did something like that.

So I think just providing ExecAsyncResponse() doesn't make it
true. But if we make it true, it would be something like
partially-reversed steps from what the current Exec*()s do for some of
the existing nodes and further code is required for some other nodes
like WindowFunction. Bubbling up works only in very simple cases where
a returned tuple is thrown up to further parent as-is or at least when
the node convers a tuple into another shape. If an async-receiver node
wants to process multiple tuples from a child or from multiple
children, it is no longer be just a bubbling up.

That being said, we could avoid passing (a-kind-of) side-channel
information when ExecProcNode is called by providing
ExecAsyncResponse(). But I don't think the "side-channel" is not a
problem since it is just another state of the node.

And.. I think the reason I feel uneasy for the patch may be that the
patch uses the interface names in somewhat different context.
Origianlly the fraemework resides in-between executor nodes, not on a
node of either side. ExecAsyncNotify() notifies the requestee about an
event and ExecAsyncResonse() notifies the requestor about a new
tuple. I don't feel strangeness in this usage. But this patch feels to
me using the same names in different (and somewhat wrong) context.

> > + /* Perform the actual callback. */
> > + ExecAsyncNotify(areq);
> >
> > Mmm. The usage of the function (or its name) looks completely reverse
> > to me. I think FDW should NOTIFY to exec nodes that the new tuple
> > gets available but the reverse is nonsense. What the function is
> > actually doing is to REQUEST fdw to fetch tuples that are expected to
> > have arrived, which is different from what the name suggests.
>
> As mentioned in a previous email, this is an FDW callback routine
> revived from Robert’s patch. I think the naming is reasonable,
> because the callback routine notifies the FDW of readiness of a file
> descriptor. And actually, the callback routine tells the core whether
> the corresponding ForeignScan node is ready for a new request or not,
> by setting the callback_pending flag accordingly.

Hmm. Agreed. The word "callback" is also used there [3]... I
remember and it seems reasonable that the core calls AsyncNotify() on
FDW and the FDW calls ExecForeignScan as a response to it and notify
back to core of that using ExecAsyncRequestDone(). But the patch here
feels a little strange, or uneasy, to me.

[3] https://www.postgresql.org/message-id/20161018.103051.30820907.horiguchi.kyotaro%40lab.ntt.co.jp

> > postgres_fdw.c
> >
> > > postgresIterateForeignScan(ForeignScanState *node)
> > > {
> > > PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
> > > TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
> > >
> > > /*
> > > * If this is the first call after Begin or ReScan, we need to create the
> > > * cursor on the remote side.
> > > */
> > > if (!fsstate->cursor_exists)
> > > create_cursor(node);
> >
> > With the patch, cursors are also created in another place so at least
> > the comment is wrong.
>
> Good catch! Will fix.
>
> > That being said, I think we should unify the
> > code except the differences between async and sync. For example, if
> > the fetch_more_data_begin() needs to be called only for async
> > fetching, the cursor should be created before calling the function, in
> > the code path common with sync fetching.
>
> I think that that would make the code easier to understand, but I’m
> not 100% sure we really need to do so.

And I believe that we don't tolerate even the slightest performance
degradation.

> Best regards,
> Etsuro Fujita
>
> [1] https://www.postgresql.org/message-id/CA%2BTgmoYrbgTBnLwnr1v%3Dpk%2BC%3DznWg7AgV9%3DM9ehrq6TDexPQNw%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmoZSWnhy%3DSB3ggQcB6EqKxzbNeNn%3DEfwARnCS5tyhhBNcw%40mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-12-14 09:04:37 Some error messages are omitted while recovery.
Previous Message Heikki Linnakangas 2020-12-14 08:51:45 Re: pg_basebackup caused FailedAssertion