Re: Append with naive multiplexing of FDWs

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bruce(at)momjian(dot)us
Cc: robertmhaas(at)gmail(dot)com, thomas(dot)munro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net
Subject: Re: Append with naive multiplexing of FDWs
Date: 2019-12-12 12:40:57
Message-ID: 20191212.214057.2013694839251103876.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

I think I can say that this patch doesn't slows non-AsyncAppend,
non-postgres_fdw scans.

At Mon, 9 Dec 2019 12:18:44 -0500, Bruce Momjian <bruce(at)momjian(dot)us> wrote in
> Certainly any overhead on normal queries would be unacceptable.

I took performance numbers on the current shape of the async execution
patch for the following scan cases.

t0 : single local table (parallel disabled)
pll : local partitioning (local Append, parallel disabled)
ft0 : single foreign table
pf0 : inheritance on 4 foreign tables, single connection
pf1 : inheritance on 4 foreign tables, 4 connections
ptf0 : partition on 4 foreign tables, single connection
ptf1 : partition on 4 foreign tables, 4 connections

The benchmarking system is configured as the follows on a single
machine.

[ benchmark client ]
| |
(localhost:5433) (localhost:5432)
| |
+----+ | +------+ |
| V V V | V
| [master server] | [async server]
| V | V
+--fdw--+ +--fdw--+

The patch works roughly in the following steps.

1. Planner decides how many children out of an append can run
asynchrnously (called as async-capable.).

2. While ExecInit if an Append doesn't have an async-capable children,
ExecAppend that is exactly the same function is set as
ExecProcNode. Otherwise ExecAppendAsync is used.

If the infrastructure part in the patch causes any degradation, the
"t0"(scan on local single table) and/or "pll" test (scan on a local
paritioned table) gets slow.

3. postgresql_fdw always runs async-capable code path.

If the postgres_fdw part causes degradation, ft0 reflects that.

The tables has two integers and the query does sum(a) on all tuples.

With the default fetch_size = 100, number is run time in ms. Each
number is the average of 14 runs.

master patched gain
t0 7325 7130 +2.7%
pll 4558 4484 +1.7%
ft0 3670 3675 -0.1%
pf0 2322 1550 +33.3%
pf1 2367 1475 +37.7%
ptf0 2517 1624 +35.5%
ptf1 2343 1497 +36.2%

With larger fetch_size (200) the gain mysteriously decreases for
sharing single connection cases (pf0, ptf0), but others don't seem
change so much.

master patched gain
t0 7212 7252 -0.6%
pll 4546 4397 +3.3%
ft0 3712 3731 -0.5%
pf0 2131 1570 +26.4%
pf1 1926 1189 +38.3%
ptf0 2001 1557 +22.2%
ptf1 1903 1193 +37.4%

FWIW, attached are the test script.

gentblr2.sql: Table creation script.
testrun.sh : Benchmarking script.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-12-12 12:52:31 Re: non-exclusive backup cleanup is mildly broken
Previous Message Alvaro Herrera 2019-12-12 12:40:06 Re: Wrong assert in TransactionGroupUpdateXidStatus