Quick Links

Re: [HACKERS] Runtime Partition Pruning

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc:	Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: [HACKERS] Runtime Partition Pruning
Date:	2018-04-05 03:14:03
Message-ID:	8658bbec-9afe-8246-c0ff-254768b1bbe7@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi David.

On 2018/03/31 22:52, David Rowley wrote:
> The attached patchset is based on Amit's v45 faster partition pruning [1].
>
> I've made a few changes since the v14 version. Since Amit's v45 patch
> now creates the partition pruning details in a data structure that can
> be copied from the planner over to the executor, it means this patch
> is now able to do much of the setup work for run-time pruning in the
> planner. Doing this now allows us to determine if run-time pruning is
> even possible at plan time, rather than the executor attempting it and
> sometimes wasting effort when it failed to find Params matching the
> partition key.
>
> Another change from the last version is that I've separated out the
> handling of exec Params and external Params. The new patch now will
> perform a pruning step at executor startup if some external params
> match the partition key. This is very beneficial to a PREPAREd OLTP
> type query against a partitioned table as it means we can skip
> sub-plan initialisation for all non-matching partitions.

This is quite nice.

> Initialising
> Append/MergeAppend/ModifyTable nodes with fewer subnodes than were
> originally planned did require a small change in explain.c in some
> code that was assuming the subnode arrays were the same length as the
> subplan list. I also ended up adding a Int property to EXPLAIN to show
> the number of subnodes that have been removed during execution.
> Unfortunately, there is a small corner case with this in the case
> where all subnodes are removed it leaves EXPLAIN with no subnodes to
> search for outer side Vars in. I didn't really see any sane way to
> handle this in EXPLAIN, so I think the best fix for this is what I've
> done, and that's just to always initalise at least 1 subnode, even
> when none match the external Params. Cost-wise, I don't think this is
> such a big deal as the cost savings here are coming from saving on
> initalising ten's or hundreds of subnodes, not 1.

I have wondered about the possibility of setting a valid (non-dummy)
targetlist in Append and MergeAppend if they're created for a partitioned
table. Currently they're overwritten by a dummy one using
set_dummy_tlist_references() in set_plan_refs() citing the following reason:

* set_dummy_tlist_references
* Replace the targetlist of an upper-level plan node with a simple
* list of OUTER_VAR references to its child.
*
* This is used for plan types like Sort and Append that don't evaluate
* their targetlists. Although the executor doesn't care at all what's in
* the tlist, EXPLAIN needs it to be realistic.

In fact, when I had noticed that this EXPLAIN behavior I had wondered if
that's something we should have discussed when d3cc37f1d801a [1] went in.

> To put the new patch to the test, I tried pgbench -S -M prepared -s
> 100 with and without having modified pgbench_accounts to separate into
> 10 RANGE partitions of equal size.
>
> A non-partitioned table was getting 12503 TPS.
> With partitioned tables, the old version of this patch was getting: 5470 TPS.
> With partitioned tables, the attached version gets 11247 TPS.
> For perspective, today's master with a partitioned table gets 4719 TPS.

Quite nice!

> So you can see it's a pretty good performance boost by skipping
> initialisation of the 9 non-matching subplans. It's not hard to
> imagine the gains getting more significant with a larger number of
> partitions. Ideally, the performance of a partitioned table would be
> faster than the non-partitioned case, but please remember this query
> is a single row PK lookup SELECT, so is a very fast query in both
> cases. Partitioning cases should improve more as the table grows and
> indexes struggle to fit in RAM, or when many rows are being taken from
> the partition and being aggregated.
>
> Unfortunately, the story is not quite as good for the non -M prepared
> version of the benchmark. This shows that planning time of partitioned
> table queries could still use some improvements. Amit's v45 patch
> certainly makes a dent in the planner slow performance here, but it's
> still nothing like as fast as the non-partitioned case. More work is
> required there in the future.

Certainly. Things like the issue that we both replied to yesterday should
be addressed for starters [2].

> This patchset also contains a patch to improve the performance of
> inheritance planning of UPDATE/DELETE queries. This patch also
> implements run-time pruning for UPDATE/DELETE too. This also has a
> significant performance improvement for planning of UPDATE/DELETE
> operations on partitioned tables with a large number of partitions,
> performance is as follows:
>
> Values in transactions per second.
>
> Partitions = 1
> Unpatched: 7323.3
> Patched: 6573.2 (-10.24%)
>
> Partitions = 2
> Unpatched: 6784.8
> Patched: 6377.1 (-6.01%)
>
> Partitions = 4
> Unpatched: 5903.0
> Patched: 6106.8 (3.45%)
>
> Partitions = 8
> Unpatched: 4582.0
> Patched: 5579.9 (21.78%)
>
> Partitions = 16
> Unpatched: 3131.5
> Patched: 4521.2 (44.38%)
>
> Partitions = 32
> Unpatched: 1779.8
> Patched: 3387.8 (90.35%)
>
> Partitions = 64
> Unpatched: 821.9
> Patched: 2245.4 (173.18%)
>
> Partitions = 128
> Unpatched: 322.2
> Patched: 1319.6 (309.56%)
>
> Partitions = 256
> Unpatched: 84.3
> Patched: 731.7 (768.27%)
>
> Partitions = 512
> Unpatched: 22.5
> Patched: 382.8 (1597.74%)
>
> Partitions = 1024
> Unpatched: 5.5
> Patched: 150.1 (2607.83%)

Great!

I will post comments on your v19 later today.

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d3cc37f1d801a

[2]
https://www.postgresql.org/message-id/20180403194613.GY28454%40telsasoft.com

In response to

Re: [HACKERS] Runtime Partition Pruning at 2018-03-31 13:52:09 from David Rowley

Responses

Re: [HACKERS] Runtime Partition Pruning at 2018-04-05 06:54:39 from David Rowley
Re: [HACKERS] Runtime Partition Pruning at 2018-04-05 12:54:41 from Amit Langote

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2018-04-05 03:41:30	Re: Parallel Aggregates for string_agg and array_agg
Previous Message	Peter Geoghegan	2018-04-05 02:40:17	Re: WIP: Covering + unique indexes.