Re: Declarative partitioning

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Declarative partitioning
Date: 2016-05-18 15:36:27
Message-ID: 573C8BFB.7050707@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Amit and all,

On 18.05.2016 04:26, Amit Langote wrote:
> On 2016/05/18 2:22, Tom Lane wrote:
>> The two ways that we've dealt with this type of hazard are to copy data
>> out of the relcache before using it; or to give the relcache the
>> responsibility of not moving a particular portion of data if it did not
>> change. From memory, the latter applies to the tuple descriptor and
>> trigger data, but we've done most other things the first way.
> It seems that tuple descriptor is reference-counted; however trigger data
> is copied. The former seems to have been done on performance grounds (I
> found 06e10abc).
>
> So for a performance-sensitive relcache data structure, refcounting is the
> way to go (although done quite rarely)? In this particular case, it is a
> "partition descriptor" that could get big for a partitioned table with
> partitions in hundreds or thousands.
>
> Thanks,
> Amit

Here is an experimental patch that optimizes planning time for range
partitioned tables (it could be considered as a "proof of concept").
Patch should be applied on top of Amit's declarative partitioning patch.
It handles only a very special case (often used though) where
partitioning key consists of just a single attribute and doesn't contain
expressions.

The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is
partitioning key attribute, OP is a comparison operator);
* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by
clause. There are possible cases:
1. If OP is '<' or '<=' then we need partitions standing left from X
(including)
2. If OP is '>' or '>=' then we need partitions standing right from
X (including)
3. If OP is '=' the we need only X partition
(for '<' and '>' operators we also check if CONST value is equal to a
lower or upper boundary (accordingly) and if it's true then exclude X).

For boolean expressions we evaluate left and right sides accordingly to
algorithm above and then based on boolean operator find intersection
(for AND) or union (for OR).

I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of
RelationGetPartitionDesc() function (see previous discussion).

Initial conditions:

CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY
RANGE (a);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0, <partitions_count> * 1000);

pgbench scripts:
https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where
:partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table.
Results (tps):

# of partitions | constraint excl. | optimized | optimized (using
pointer)
----------------+------------------+---------------+----------------------------
100 | 658 | 2906 | 3079
1000 | 45 | 2174 | 3021
2000 | 22 | 1667 | 2919

The second script tests fetching all data from a single partition.
Results (tps):

# of partitions | constraint excl. | optimized | optimized (using
pointer)
----------------+------------------+---------------+----------------------------
100 | 317 | 1001 | 1051
1000 | 34 | 941 | 1023
2000 | 15 | 813 | 1016

Optimized version works much faster on large amount of partitions and
degradates slower than constraint exclusion. But still there is a
noticeable performance degradation from copying PartitionDesc structure:
with 2000 partitions RelationGetPartitionDesc() function spent more than
40% of all execution time on copying in first benchmark (measured with
`perf`). Using reference counting as Amit suggests will allow to
significantily decrease performance degradation.

Any comments and suggestions are very welcome. Thanks!

--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru

Attachment Content-Type Size
20160518_decl_part_opt.patch text/x-patch 19.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message chang chao 2016-05-18 15:42:15 Re: The rewritting of join conditions caused a very slow query plan.
Previous Message Joe Conway 2016-05-18 14:09:51 Re: Reviewing freeze map code