Partitioning WIP patch (was: Partitioning: issues/ideas)

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Partitioning WIP patch (was: Partitioning: issues/ideas)
Date: 2015-02-24 08:13:42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21-01-2015 PM 07:26, Amit Langote wrote:
> Ok, I will limit myself to focusing on following things at the moment:
> * Provide syntax in CREATE TABLE to declare partition key
> * Provide syntax in CREATE TABLE to declare a table as partition of a
> partitioned table and values it contains
> * Arrange to have partition key and values stored in appropriate
> catalogs (existing or new)
> * Arrange to cache partitioning info of partitioned tables in relcache

Here is an experimental patch that attempts to implement this.

It implements the following syntax:

* Syntax for defining partition key:
CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

* Syntax for declaring a table as partition of a partitioned table:
CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause;

where values_clause can be:

IN (list_of_values), or
BETWEEN (range_lower_bounds) AND (range_upper_bounds);

The semantics for a range is [range_lower_bounds,range_upper_bounds),
that is, lower inclusive, upper exclusive. (this might later change
subject to choice regarding preferred/desired syntax)

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning

{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

There are two system catalogs pg_partitioned_rel and pg_partition which
store partition key spec and partition value spec, respectively.
(remember to initdb if interested in trying)

Please note that the above syntax and/or catalog may not be very
appealing nor that they won't change/disappear. I am posting the patch
more for examining the approach of internal representation of the
metadata for partitioning and get some general comments.

The approach I have implemented in this patch consists of loading
the partition key info and a list of partitions into the relation
descriptor for a partitioned table. In case of range partitioning, this
list is sorted on the range max bound. To see if that works any good, I
hacked ExecInsert() to make it find a partition for a tuple by
adding a ExecFindPartition() just before heap_insert(). It accepts
resultRelInfo of the parent and a tuple slot. It binary-searches for and
returns the descriptor of the chosen partition which ExecInsert() then
uses to perform heap_insert() and inserting index tuples. If no
partition is found, parent relation itself is used. heap_insert() and
ExecInsertIndexTuples() are the only things for which partition relation
is used. All of this is just experimental and most probably wrong in
details; but is done just to see what kind of performance to expect from
the chosen internal representation. Another thing is the approach that
tuple-routing (& partition-pruning) is a function of partitioned
relation and the tuple (or restrict quals). It will be more significant
when we'll get to implementing a partition-pruning function.

See below an example to show that having an extra ExecFindPartition()
does not degrade the performance of inserting a tuple much:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);

CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);


CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);

-- insert 1 tuple into the plain table
INSERT INTO parent_monthly VALUES (2013, 12, 01);
Time: 3.303 ms

-- insert 1 tuple into the partitioned table
-- #part: number of partitions
-- case 1: find no valid partition
-- case 2: find a valid partition

#parts case 1 case 2
======== ======== ========
10 3.248 ms 3.509 ms
100 3.546 ms 3.269 ms
500 3.497 ms 3.048 ms
1000 3.364 ms 5.379 ms
10000 4.943 ms 5.076 ms



Attachment Content-Type Size
20140224_partitions-tuple-routing-poc_v01.patch text/x-diff 92.1 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-02-24 08:35:27 Re: Partitioning WIP patch (was: Partitioning: issues/ideas)
Previous Message Michael Paquier 2015-02-24 08:06:02 Re: improving speed of make check-world