Re: Asynchronous Append on postgres_fdw nodes.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "movead(dot)li" <movead(dot)li(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2021-02-20 06:35:45
Message-ID: CAPmGK14dGe3i7kF_2tz59Gb11P+_HQOeeQLP_ER4BMN5nEJzYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 18, 2021 at 3:16 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Thu, 18 Feb 2021 11:51:59 +0900, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote in
> > I noticed that this doesn’t work for cases where ForeignScans are
> > executed inside functions, and I don’t have any simple solution for
>
> Ah, concurrent fetches in different plan trees? (For fairness, I
> hadn't noticed that case:p) The same can happen when an extension that
> is called via hooks.

Yeah, consider a plan containing a FunctionScan that invokes a query
like e.g., “SELECT * FROM foreign_table” via SPI.

> > So I’m getting back to what Horiguchi-san proposed for
> > postgres_fdw to handle concurrent fetches from a remote server
> > performed by multiple ForeignScan nodes that use the same connection.
> > As discussed before, we would need to create a scheduler for
> > performing such fetches in a more optimized way to avoid a performance
> > degradation in some cases, but that wouldn’t be easy.
>
> If the "degradation" means degradation caused by repeated creation of
> remote cursors, anyway every node on the same connection create its
> own connection named as "c<n>" and never "re"created in any case.
>
> If the "degradation" means that my patch needs to wait for the
> previous prefetching query to return tuples before sending a new query
> (vacate_connection()), it is just moving the wait from just before
> sending the new query to just before fetching the next round of the
> previous node. The only case it becomes visible degradation is where
> the tuples in the next round is not wanted by the upper nodes.

The latter. And yeah, typical cases where the performance degradation
occurs would be queries with LIMIT, as discussed in [1].

I’m not concerned about postgres_fdw modified to process an
in-progress fetch by a ForeignScan before starting a new
asynchronous/synchronous fetch by another ForeignScan using the same
connection. Actually, that seems pretty reasonable to me, so I’d like
to use that part in your patch in the next version. My concern is
that postgresIterateForeignScan() was modified to start another
asynchronous fetch from a remote table (if possible) right after doing
fetch_received_data() for the remote table, because aggressive
prefetching like that may increase the probability that ForeignScans
using the same connection conflict with each other, leading to a large
performance degradation. (Another issue with that would be that the
fsstate->tuples array for the remote table may be enlarged
indefinitely.)

Whether the degradation is acceptable or not would depend on the user,
and needless to say, the smaller degradation would be more acceptable.
So I’ll update the patch using your patch without the
postgresIterateForeignScan() change.

> > In his proposal,
> > postgres_fdw was modified to perform prefetching pretty aggressively,
> > so I mean removing aggressive prefetching. I think we could add it to
> > postgres_fdw later maybe as the server/table options.

> That was the natural extension from non-aggresive prefetching.

I also suppose that that would improve the performance in some cases.
Let’s leave that for future work.

> However, maybe we can live without that since if some needs more
> speed, it is enought to give every remote tables a dedicate
> connection.

Yeah, I think so too.

Thanks!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/CAPmGK16E1erFV9STg8yokoewY6E-zEJtLzHUJcQx%2B3dyivCT%3DA%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-20 07:19:20 Re: Improvements and additions to COPY progress reporting
Previous Message Bharath Rupireddy 2021-02-20 06:09:22 Re: Improvements and additions to COPY progress reporting