Re: On partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-08-29 16:35:50
Message-ID: 20746.1409330150@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> [ partition sketch ]

> In this design, partitions are first-class objects, not normal tables in
> inheritance hierarchies. There are no pg_inherits entries involved at all.

Hm, actually I'd say they are *not* first class objects; the problem with
the existing design is exactly that child tables *are* first class
objects. This is merely a terminology quibble though.

> * relkind RELKIND_PARTITION 'p' indicates a partition within a partitioned
> relation (its parent). These cannot be addressed directly in DML
> queries and only limited DDL support is provided. They don't have
> their own pg_attribute entries either and therefore they are always
> identical in column definitions to the parent relation.

Not sure that not storing the pg_attribute rows is a good thing; but
that's something that won't be clear till you try to code it.

> Each partition is assigned an Expression that receives a tuple and
> returns boolean. This expression returns true if a given tuple belongs
> into it, false otherwise.

-1, in fact minus a lot. One of the core problems of the current approach
is that the system, particularly the planner, hasn't got a lot of insight
into exactly what the partitioning scheme is in a partitioned table built
on inheritance. If you allow the partitioning rule to be a black box then
that doesn't get any better. I want to see a design wherein the system
understands *exactly* what the partitioning behavior is. I'd start with
supporting range-based partitioning explicitly, and maybe we could add
other behaviors such as hashing later.

In particular, there should never be any question at all that there is
exactly one partition that a given row belongs to, not more, not less.
You can't achieve that with a set of independent filter expressions;
a meta-rule that says "exactly one of them should return true" is an
untrustworthy band-aid.

(This does not preclude us from mapping the tuple through the partitioning
rule and finding that the corresponding partition doesn't currently exist.
I think we could view the partitioning rule as a function from tuples to
partition numbers, and then we look in pg_class to see if such a partition
exists.)

> Additionally, each partitioned relation may have a master expression.
> This receives a tuple and returns an integer, which corresponds to the
> number of the partition it belongs into.

I guess this might be the same thing I'm arguing for, except that I say
it is not optional but is *the* way you define the partitioning. And
I don't really want black-box expressions even in this formulation.
If you're looking for arbitrary partitioning rules, you can keep on
using inheritance. The point of inventing partitioning, IMHO, is for
the system to have a lot more understanding of the behavior than is
possible now.

As an example of the point I'm trying to make, the planner should be able
to discard range-based partitions that are eliminated by a WHERE clause
with something a great deal cheaper than the theorem prover it currently
has to use for the purpose. Black-box partitioning rules not only don't
improve that situation, they actually make it worse.

Other than that, this sketch seems reasonable ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2014-08-29 16:38:34 Re: On partitioning
Previous Message Alvaro Herrera 2014-08-29 15:56:07 On partitioning