Re: Declarative partitioning grammar

From: Jeff Cohen <jcohen(at)greenplum(dot)com>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: Warren Turkal <turkal(at)google(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Declarative partitioning grammar
Date: 2008-01-15 03:18:43
Message-ID: B5181252-B6E4-413B-9CBA-1D4355A3ADFA@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Jan 14, 2008, at 1:49 AM, Markus Schiltknecht wrote:

> I don't think the separation into list, hash and range partitioning
> is adequate. What is the system supposed to do, if you try to
> insert a row which doesn't fit any of the values in your list or
> doesn't fit any of the ranges you defined?

Hi Markus,

If you don't define a "default" partition to handle outliers, the
insert should fail with an error.

> I prefer a partitioning grammar which doesn't interfere with
> constraints. We all know how to define constraints. Please don't
> introduce a new, ambiguous way. A partitioning definition should be
> able to tell the target partition for *every* row which satisfies
> the constraints (the real ones, not ambiguous ones).
>
> IMO, a single DDL command should only touch a single split point,
> i.e. split a table into two partitions, move the split point or
> remove the split point (joining the partitions again). Those are
> the only basic commands you need to be able to handle partitioning.

I can certainly appreciate the simplicity of this approach. It lets
us use a generic check constraint to perform partitioning, so it is
more general than partitioning using hash, list, and range. However,
it achieves this generality at the expense of usability for typical
customer cases. For example, let's look at the case of a table of 1
year of sales data, where we want to create 12 partitions -- one for
each month.

With the generic approach, you start with a single table, and start
by splitting it into two six-month partitions:

ALTER TABLE sales
SPLIT where sales_date > date '2007-06-01'
INTO
(
PARTITION first_half
PARTITION second_half
);

We could implement this approach using check constraints and table
inheritance: the partition second_half is a child table where
sales_date > date '2007-06-01', and the partition first_half has the
complementary constraint NOT(sales_date > date '2007-06-01').

Next, you split each partition:

ALTER TABLE sales
SPLIT PARTITION first_half where sales_date > date '2007-03-01'
INTO
(
PARTITION first_quarter
PARTITION second_quarter
);

So now the child table for first_half itself has two children. As
you continue this process you construct a binary tree of table
inheritance using 12 ALTER statements.

In the "long" grammar you can create and partition the table in one
statement:

CREATE TABLE sales
...
PARTITION BY sales_date
(
start (date '2007-01-01') end (date '2008-01-01')
every (interval '1 month')
);

> Sorry, but for my taste, the proposed grammar is too long per
> command, not flexible enough and instead ambiguous for split points
> as well as for constraints. To me it looks like repeating the
> mistakes of others.

Thanks for your feedback. Partitioning the table using series of
splits is a clever solution for situations where the partitioning
operation cannot be described using simple equality (like list,hash)
or ordered comparison (range). But for many common business cases,
the "long" grammar is easier to specify.

kind regards,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-01-15 03:24:06 Re: SSL over Unix-domain sockets
Previous Message Jaime Casanova 2008-01-15 03:12:55 Re: could not open relation: Invalid argument