Re: asynchronous execution

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject: Re: asynchronous execution
Date: 2017-10-20 08:37:07
Message-ID: 20171020.173707.12913619.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

Fully-asynchronous executor needs that every node is stateful and
suspendable at the time of requesting for the next tuples to
underneath nodes. I tried pure push-base executor but failed.

After the miserable patch upthread, I finally managed to make
executor nodes suspendable using computational jump and got rid
of recursive calls of executor. But it runs about x10 slower for
simple SeqScan case. (pgbench ran with 9% degradation.) It
doesn't seem recoverable by handy improvements. So I gave up
that.

Then I returned to single-level asynchrony, in other words, the
simple case with async-aware nodes just above async-capable
nodes. The motive of using the framework in the previous patch
was that we had degradation on the sync (or normal) code paths by
polluting ExecProcNode with async stuff and as Tom's suggestion
the node->ExecProcNode trick can isolate the async code path.

The attached PoC patch theoretically has no impact on the normal
code paths and just brings gain in async cases. (Additional
members in PlanState made degradation seemingly comes from
alignment, though.)

But I haven't had enough stable result from performance
test. Different builds from the same source code gives apparently
different results...

Anyway I'll show the best one in the several times run here.

original(ms) patched(ms) gain(%)
A: simple table scan : 9714.70 9656.73 0.6
B: local partitioning : 4119.44 4131.10 -0.3
C: single remote table : 9484.86 9141.89 3.7
D: sharding (single con) : 7114.34 6751.21 5.1
E: sharding (multi con) : 7166.56 1827.93 74.5

A and B are degradation checks, which are expected to show no
degradation. C is the gain only by postgres_fdw's command
presending on a remote table. D is the gain of sharding on a
connection. The number of partitions/shards is 4. E is the gain
using dedicate connection per shard.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch text/x-patch 9.4 KB
0002-core-side-modification.patch text/x-patch 25.0 KB
0003-async-postgres_fdw.patch text/x-patch 47.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sokolov Yura 2017-10-20 08:54:04 Re: Fix performance degradation of contended LWLock on NUMA
Previous Message Leon Winter 2017-10-20 07:57:38 Re: Cursor With_Hold Performance Workarounds/Optimization