Re: On partitioning

From: "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Bruce Momjian'" <bruce(at)momjian(dot)us>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-11-11 01:53:07
Message-ID: 066501cffd52$3d919390$b8b4bab0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Saturday, November 08, 2014 5:41 AM
>
> I'd be in favor of that.

Thanks!

> I am not sure whether the code is close
> enough to what we need to be really useful, but that's for you to
> decide.

Hmm, I'm not entirely convinced about the patch as it stands either but, I will
try to restate below what the patch in its current state does anyway (just to
refresh):

The patch provides syntax to:
* Specify partitioning key, optional partition definitions within CREATE TABLE,
* A few ALTER TABLE commands that let you define a partitioning key
(partitioning a table after the fact), attach/detach an existing table as a
partition of a partitioned table,
* CREATE PARTITION to create a new partition on a partitioned table.

Above commands are merely transformed into ALTER TABLE subcommands that arrange
partitioned table and partitions into inheritance hierarchy, but with extra
information, that is, allowed values for the partition in a new anyarray column
called 'pg_inherits.values'. A special case of ATExecAddInherit() namely
ATExecAttachPartitionI(), as part of its processing, also adds partition
constraints in the form of appropriate CHECK constraints. So, a few of the
manual steps are automated and additional (IMHO non-opaque) metadata (namely
partition boundaries/list values) is added.

Additionally, defining a partitioning key (PARTITION BY) creates a pg_partition
entry that specifies for a partitioned table the following - partition kind
(range/list), an opclass for the key value comparison and a key 'expression'
(say, "colname % 10").

A few key things I can think of as needing improvement would be (perhaps just
reiterating a review of the patch):

* partition pruning would still depend on constraint exclusion using the CHECK
constraints (same old)
* there is no tuple-routing at all (same can be said of partition pruning
above)
* partition pruning or tuple-routing would require a scan over pg_inherits
(perhaps inefficient)
* partitioning key is an expression which might not be a good idea in early
stages of the implementation (might be better off with just the attnum of the
column to partition on?)
* there is no DROP PARTITION (in fact, it is suggested not to go CREATE/DROP
PARTITION route at all) -> ALTER TABLE ... ADD/DROP PARTITION?

Some other important ones:
* dependency handling related oversights
* constraint propagation related oversights

And then some of the oddities of behaviour that I am seeing while trying out
things that the patch does. Please feel free to suggest those that I am not
seeing. I am sure these improvements need more than just tablecmds.c hacking
which is what the current patch mostly does.

The first two points could use separate follow-on patches as I feel they need
extensive changes unless I am missing something. I will try to post possible
solutions to these issues provided metadata in current form is OK to proceed.

> In my view, the main problem we should be trying to solve
> here is "avoid relying on constraint exclusion". In other words, the
> syntax for adding a partition should put some metadata into the system
> catalogs that lets us do partitioning pruning very very quickly,
> without theorem-proving. For example, for list or range partitioning,
> a list of partition bounds would be just right: you could
> binary-search it. The same metadata should also be suitable for
> routing inserts to the proper partition, and handling partition motion
> when a tuple is updated.
>
> Now there's other stuff we might very well want to do, but I think
> making partition pruning and tuple routing fast would be a pretty big
> win by itself.
>

Those are definitely the goals worth striving for.

Thanks for your time.

Regards,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-11-11 02:02:10 Re: Add CREATE support to event triggers
Previous Message Tom Lane 2014-11-11 01:52:59 Re: Proposal: Log inability to lock pages during vacuum