Re: Asynchronous Append on postgres_fdw nodes.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: etsuro(dot)fujita(at)gmail(dot)com
Cc: pryzby(at)telsasoft(dot)com, a(dot)lepikhov(at)postgrespro(dot)ru, movead(dot)li(at)highgo(dot)ca, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2021-02-18 06:15:57
Message-ID: 20210218.151557.1106337659785292399.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry that I haven't been able to respond.

At Thu, 18 Feb 2021 11:51:59 +0900, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote in
> On Wed, Feb 10, 2021 at 9:31 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > Please find attached an updated patch.
>
> I noticed that this doesn’t work for cases where ForeignScans are
> executed inside functions, and I don’t have any simple solution for

Ah, concurrent fetches in different plan trees? (For fairness, I
hadn't noticed that case:p) The same can happen when an extension that
is called via hooks.

> that. So I’m getting back to what Horiguchi-san proposed for
> postgres_fdw to handle concurrent fetches from a remote server
> performed by multiple ForeignScan nodes that use the same connection.
> As discussed before, we would need to create a scheduler for
> performing such fetches in a more optimized way to avoid a performance
> degradation in some cases, but that wouldn’t be easy. Instead, how

If the "degradation" means degradation caused by repeated creation of
remote cursors, anyway every node on the same connection create its
own connection named as "c<n>" and never "re"created in any case.

If the "degradation" means that my patch needs to wait for the
previous prefetching query to return tuples before sending a new query
(vacate_connection()), it is just moving the wait from just before
sending the new query to just before fetching the next round of the
previous node. The only case it becomes visible degradation is where
the tuples in the next round is not wanted by the upper nodes.

unpatched

nodeA <tuple exhaused>
<send prefetching FETCH A>
<return the last tuple of the last round>
nodeB !!<wait for FETCH A returns>
<send FETCH B>
!!<wait for FETCH B returns>
<return tuple just returned>
nodeA <return already fetched tuple>

patched

nodeA <tuple exhaused>
<return the last tuple of the last round>
nodeB <send FETCH B>
!!<wait for FETCH B returns>
<return the first tuple of the round>
nodeA <send FETCH A>
!!<wait for FETCH A returns>
<return the first tuple of the round>

That happens when the upper node stops just after the internal
tuplestore is emptied, and the probability is one in fetch_tuples. (It
is not stochastic so if a query gets suffered by the degradation, it
always suffers unless fetch_tuples is not changed.) I'm still not
sure that degree of degradaton becomes a show stopper.

> degradation in some cases, but that wouldn’t be easy. Instead, how
> about reducing concurrency as an alternative? In his proposal,
> postgres_fdw was modified to perform prefetching pretty aggressively,
> so I mean removing aggressive prefetching. I think we could add it to
> postgres_fdw later maybe as the server/table options. Sorry for the
> back and forth.

That was the natural extension from non-aggresive prefetching.
However, maybe we can live without that since if some needs more
speed, it is enought to give every remote tables a dedicate
connection.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-18 07:15:44 Re: pg_collation_actual_version() ERROR: cache lookup failed for collation 123
Previous Message Andres Freund 2021-02-18 06:02:22 Re: A qsort template