Re: Asynchronous Append on postgres_fdw nodes.

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2021-05-06 17:12:24
Message-ID: 20210506171224.GV20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Etsuro Fujita (etsuro(dot)fujita(at)gmail(dot)com) wrote:
> On Thu, Mar 4, 2021 at 1:00 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> > Another thing I'm concerned about in the postgres_fdw part is the case
> > where all/many postgres_fdw ForeignScans of an Append use the same
> > connection, because in that case those ForeignScans are executed one
> > by one, not in parallel, and hence the overhead of async execution
> > (i.e., doing ExecAppendAsyncEventWait()) would merely cause a
> > performance degradation. Here is such an example:
> >
> > postgres=# create server loopback foreign data wrapper postgres_fdw
> > options (dbname 'postgres');
> > postgres=# create user mapping for current_user server loopback;
> > postgres=# create table pt (a int, b int, c text) partition by range (a);
> > postgres=# create table loct1 (a int, b int, c text);
> > postgres=# create table loct2 (a int, b int, c text);
> > postgres=# create table loct3 (a int, b int, c text);
> > postgres=# create foreign table p1 partition of pt for values from
> > (10) to (20) server loopback options (table_name 'loct1');
> > postgres=# create foreign table p2 partition of pt for values from
> > (20) to (30) server loopback options (table_name 'loct2');
> > postgres=# create foreign table p3 partition of pt for values from
> > (30) to (40) server loopback options (table_name 'loct3');
> > postgres=# insert into p1 select 10 + i % 10, i, to_char(i, 'FM00000')
> > from generate_series(0, 99999) i;
> > postgres=# insert into p2 select 20 + i % 10, i, to_char(i, 'FM00000')
> > from generate_series(0, 99999) i;
> > postgres=# insert into p3 select 30 + i % 10, i, to_char(i, 'FM00000')
> > from generate_series(0, 99999) i;
> > postgres=# analyze pt;
> >
> > postgres=# set enable_async_append to off;
> > postgres=# select count(*) from pt;
> > count
> > --------
> > 300000
> > (1 row)
> >
> > Time: 366.905 ms
> >
> > postgres=# set enable_async_append to on;
> > postgres=# select count(*) from pt;
> > count
> > --------
> > 300000
> > (1 row)
> >
> > Time: 385.431 ms
>
> I think the user should be careful about this. How about adding a
> note about it to the “Asynchronous Execution Options” section in
> postgres-fdw.sgml, like the attached?

I'd suggest the language point out that it's not actually possible to do
otherwise, since they all need to be part of the same transaction.

Without that, it looks like we're just missing a trick somewhere and
someone might think that they could improve PG to open multiple
connections to the same remote server to execute them in parallel.

Maybe:

In order to ensure that the data being returned from a foreign server
is consistent, postgres_fdw will only open one connection for a given
foreign server and will run all queries against that server sequentially
even if there are multiple foreign tables involved. In such a case, it
may be more performant to disable this option to eliminate the overhead
associated with running queries asynchronously.

... then again, it'd really be better if we could figure out a way to
just do the right thing here. I haven't looked at this in depth but I
would think that the overhead of async would be well worth it just about
any time there's more than one foreign server involved. Is it not
reasonable to have a heuristic where we disable async in the cases where
there's only one foreign server, but have it enabled all the other time?
While continuing to allow users to manage it explicitly if they want.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2021-05-06 17:13:47 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Previous Message Isaac Morland 2021-05-06 16:37:39 Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'