From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | robertmhaas(at)gmail(dot)com |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: asynchronous execution |
Date: | 2017-10-20 08:37:07 |
Message-ID: | 20171020.173707.12913619.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
Fully-asynchronous executor needs that every node is stateful and
suspendable at the time of requesting for the next tuples to
underneath nodes. I tried pure push-base executor but failed.
After the miserable patch upthread, I finally managed to make
executor nodes suspendable using computational jump and got rid
of recursive calls of executor. But it runs about x10 slower for
simple SeqScan case. (pgbench ran with 9% degradation.) It
doesn't seem recoverable by handy improvements. So I gave up
that.
Then I returned to single-level asynchrony, in other words, the
simple case with async-aware nodes just above async-capable
nodes. The motive of using the framework in the previous patch
was that we had degradation on the sync (or normal) code paths by
polluting ExecProcNode with async stuff and as Tom's suggestion
the node->ExecProcNode trick can isolate the async code path.
The attached PoC patch theoretically has no impact on the normal
code paths and just brings gain in async cases. (Additional
members in PlanState made degradation seemingly comes from
alignment, though.)
But I haven't had enough stable result from performance
test. Different builds from the same source code gives apparently
different results...
Anyway I'll show the best one in the several times run here.
original(ms) patched(ms) gain(%)
A: simple table scan : 9714.70 9656.73 0.6
B: local partitioning : 4119.44 4131.10 -0.3
C: single remote table : 9484.86 9141.89 3.7
D: sharding (single con) : 7114.34 6751.21 5.1
E: sharding (multi con) : 7166.56 1827.93 74.5
A and B are degradation checks, which are expected to show no
degradation. C is the gain only by postgres_fdw's command
presending on a remote table. D is the gain of sharding on a
connection. The number of partitions/shards is 4. E is the gain
using dedicate connection per shard.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch | text/x-patch | 9.4 KB |
0002-core-side-modification.patch | text/x-patch | 25.0 KB |
0003-async-postgres_fdw.patch | text/x-patch | 47.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Sokolov Yura | 2017-10-20 08:54:04 | Re: Fix performance degradation of contended LWLock on NUMA |
Previous Message | Leon Winter | 2017-10-20 07:57:38 | Re: Cursor With_Hold Performance Workarounds/Optimization |