Re: Async execution of postgres_fdw.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Async execution of postgres_fdw.
Date: 2015-01-19 06:24:16
Message-ID: 20150119.152416.57998318.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, that's a silly mistake. fetch_seize = 10000 in the v4
patch. This v5 patch is fixed at the point.

> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
...
> > Execution time: 6512.043 ms

fetch_size was 10000 at this run. I got about 13.0 seconds for
fetch_size = 100, which is about 19% faster than the original.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

=======
15 17:18:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20150116(dot)171849(dot)109146500(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> I revised the patch so that async scan will be done more
> aggressively, and took execution time for two very simple cases.
>
> As the result, simple seq scan gained 5% and hash join of two
> foreign tables gained 150%. (2.4 times faster).
>
> While measuring the performance, I noticed that each scan in a
> query runs at once rather than alternating with each other in
> many cases such as hash join or sorted joins and so. So I
> modified the patch so that async fetch is done more
> aggressively. The new v4 patch is attached. The following numbers
> are taken based on it.
>
> ========
> Simple seq scan for the first test.
>
> > CREATE TABLE lt1 (a int, b timestamp, c text);
> > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost');
> > CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> > CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a);
>
> On this case, I took the the 10 times average of exec time of the
> following query for both master head and patched version. The
> fetch size is 100.
>
> > postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
> > QUERY PLAN
> > ------------------------------------------------------------------
> > Foreign Scan on ft1 (actual time=0.79 5..4175.706 rows=1000000 loops=1)
> > Planning time: 0.060 ms
> > Execution time: 4276.043 ms
>
> master head : avg = 4256.621, std dev = 17.099
> patched pgfdw: avg = 4036.463, std dev = 2.608
>
> The patched version is faster by about 5%. This should be pure
> result of asynchronous fetching, not including the effect of
> early starting of remote execution in ExecInit.
>
> Interestingly, as fetch_count gets larger, the gain raises in
> spite of the decrease of the number of query sending.
>
> master head : avg = 2622.759, std dev = 38.379
> patched pgfdw: avg = 2277.622, std dev = 27.269
>
> About 15% gain. And for 10000,
>
> master head : avg = 2000.980, std dev = 6.434
> patched pgfdw: avg = 1616.793, std dev = 13.192
>
> 19%.. It is natural that exec time reduces along with increase of
> fetch size, but I haven't found the reason why the patch's gain
> also increases.
>
> ======================
>
> The second case is a simple join of two foreign tables sharing
> one connection.
>
> The master head runs this query in about 16 seconds with almost
> no fluctuation among multiple tries.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1)
> > Hash Cond: (x.a = y.a)
> > -> Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1)
> > -> Hash (actual time=7539.761..7539.761 rows=1000000 loops=1)
> > Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> > -> Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1)
> > Planning time: 0.223 ms
> > Execution time: 15973.916 ms
>
> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1)
> > Hash Cond: (x.a = y.a)
> > -> Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1)
> > -> Hash (actual time=2523.810..2523.810 rows=1000000 loops=1)
> > Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> > -> Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1)
> > Planning time: 0.220 ms
> > Execution time: 6512.043 ms
>
> The result data seems not broken. I don't know the reason yet but
> I'll investigate it.

Attachment Content-Type Size
0001-Asynchronous-execution-of-postgres_fdw-v5.patch text/x-patch 37.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2015-01-19 06:54:24 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Amit Kapila 2015-01-19 06:08:32 Re: TABLESAMPLE patch