Re: Asynchronous Append on postgres_fdw nodes.

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, "movead(dot)li" <movead(dot)li(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Date: 2021-02-14 11:06:57
Message-ID: CAPmGK14tsj-kYYb4z4CUVkiTRzWH0uAGW7a6STbYt9VMvdUKvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 12, 2021 at 5:30 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> It seems too specific to async Append so I look it as a PoC of the
> mechanism.

Are you saying that the patch only reconsiders async ForeignScans?

> It creates a hash table that keyed by connection umid to record
> planids run on the connection, triggerd by core planner via a dedicate
> API function. It seems to me that ConnCacheEntry.state can hold that
> and the hash is not needed at all.

I think a good thing about the hash table is that it can be used by
other FDWs that support async execution in a similar way to
postgres_fdw, so they don’t need to create their own hash tables. But
I’d like to know about the idea of using ConnCacheEntry. Could you
elaborate a bit more about that?

> | postgresReconsiderAsyncForeignScan(ForeignScanState *node, AsyncContext *acxt)
> | {
> | ...
> | /*
> | * If the connection used for the ForeignScan node is used in other parts
> | * of the query plan tree except async subplans of the parent Append node,
> | * disable async execution of the ForeignScan node.
> | */
> | if (!bms_is_subset(fsplanids, asyncplanids))
> | return false;
>
> This would be a reasonable restriction.

Cool!

> | /*
> | * If the subplans of the Append node are all async-capable, and use the
> | * same connection, then we won't execute them asynchronously.
> | */
> | if (requestor->as_nasyncplans == requestor->as_nplans &&
> | !bms_nonempty_difference(asyncplanids, fsplanids))
> | return false;
>
> It is the correct restiction? I understand that the currently
> intending restriction is one connection accepts at most one FDW-scan
> node. This looks somethig different...

People put multiple partitions in a remote PostgreSQL server in
sharding, so the patch allows multiple postgres_fdw ForeignScans
beneath an Append that use the same connection to be executed
asynchronously like this:

postgres=# create table t1 (a int, b int, c text);
postgres=# create table t2 (a int, b int, c text);
postgres=# create table t3 (a int, b int, c text);
postgres=# create foreign table p1 (a int, b int, c text) server
server1 options (table_name 't1');
postgres=# create foreign table p2 (a int, b int, c text) server
server2 options (table_name 't2');
postgres=# create foreign table p3 (a int, b int, c text) server
server2 options (table_name 't3');
postgres=# create table pt (a int, b int, c text) partition by range (a);
postgres=# alter table pt attach partition p1 for values from (10) to (20);
postgres=# alter table pt attach partition p2 for values from (20) to (30);
postgres=# alter table pt attach partition p3 for values from (30) to (40);
postgres=# insert into p1 select 10 + i % 10, i, to_char(i, 'FM0000')
from generate_series(0, 99) i;
postgres=# insert into p2 select 20 + i % 10, i, to_char(i, 'FM0000')
from generate_series(0, 99) i;
postgres=# insert into p3 select 30 + i % 10, i, to_char(i, 'FM0000')
from generate_series(0, 99) i;
postgres=# analyze pt;

postgres=# explain verbose select count(*) from pt;
QUERY PLAN
------------------------------------------------------------------------------------------
Aggregate (cost=314.25..314.26 rows=1 width=8)
Output: count(*)
-> Append (cost=100.00..313.50 rows=300 width=0)
-> Async Foreign Scan on public.p1 pt_1
(cost=100.00..104.00 rows=100 width=0)
Remote SQL: SELECT NULL FROM public.t1
-> Async Foreign Scan on public.p2 pt_2
(cost=100.00..104.00 rows=100 width=0)
Remote SQL: SELECT NULL FROM public.t2
-> Async Foreign Scan on public.p3 pt_3
(cost=100.00..104.00 rows=100 width=0)
Remote SQL: SELECT NULL FROM public.t3
(9 rows)

For this query, p2 and p3, which use the same connection, are scanned
asynchronously!

But if all the subplans of an Append are async postgres_fdw
ForeignScans that use the same connection, they won’t be parallelized
at all, and the overhead of async execution may cause a performance
degradation. So the patch disables async execution of them in that
case using the above code bit.

Thanks for the review!

Best regards,
Etsuro Fujita

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-14 11:22:03 Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)
Previous Message Peter Geoghegan 2021-02-14 06:47:13 Re: 64-bit XIDs in deleted nbtree pages