Re: Declarative partitioning

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning
Date: 2016-02-24 18:31:20
Message-ID: CADkLM=f1zyL-cV_LK8GpXv=iS_=VbHs8bernUMa+8yHtUXjebQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Hm, I see. How about multi-column keys? Do we care enough about that use
> case? I don't see a nice-enough-looking range literal for such keys.
> Consider for instance,
>
> With the partitioned table defined as:
>
> CREATE TABLE foo(c1 char(2), c2 char(2)) PARTITION BY RANGE (c1, c2);
>

Good question! I would assume that we'd use a syntax that presumes c1 and
c2 are a hypothetical composite type. But what does that look like?

To answer it, I tried this:

# create type duple as (a text, b text);
CREATE TYPE
# create type duplerange as range (subtype = duple);
CREATE TYPE
# select '(beebop,alula)'::duple;
duple
----------------
(beebop,alula)
(1 row)

# select '("hey ho","letsgo")'::duple;
duple
-------------------
("hey ho",letsgo)
(1 row)

analytics=# select duplerange('(beebop,alula)','("hey ho","letsgo")','(]');
duplerange
------------------------------------------
("(beebop,alula)","(""hey ho"",letsgo)"]
(1 row)

So I would assume that we'd use a syntax that presumed the columns were in
a composite range type.

Which means your creates would look like (following Robert Haas's implied
suggestion that we leave off the string literal quotes):

CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES ( , (b,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );

That's not terrible looking.

We would want to also think about what subset of many permutations of this
> syntax to accept range specs for new partitions. Mostly to preserve the
> non-overlapping invariant and I think it would also be nice to prevent
> gaps.
>

Gaps *might* be intentional. I can certainly see where we'd want to set up
warnings for discontinuity, or perhaps some utility functions:
pg_partitions_ranges_are_continuous('master_table_name')
pg_partitions_are_adjacent('master_table_name','p1','p2')

But for the most part, range partitions evolve from splits when one
partition grows too big, so that won't be such a problem.

Consider that once we create:
>
> PARTITION FOR VALUES [current_date,);
>
> Now to create a new partition starting at later date, we have to have a
> "split partition" feature which would scan the above partition to
> distribute the existing data rows appropriately to the resulting two
> partitions. Right?
>

Correct. And depending on policy, that might be desirable and might be not.
If the table were for death records, we'd very much want to reject rows in
the future, if only to avoid upsetting the person.
If the table were of movie release dates, we'd *expect* that only dates
(,current_date] would be entered, but if someone chose to leak a release
date, we'd want to capture that and deal with it later.
So yeah, we're going to (eventually) need a SPLIT PARTITION that migrates
rows to a new partition.

> IOW, one shouldn't create an unbounded partition if more partitions in the
> unbounded direction are expected to be created. It would be OK for
> unbounded partitions to be on the lower end most of the times.
>

On this I'll have to disagree. My own use case where I use my
range_partitioning extension starts off with a single partition () and all
new partitions are splits of that. The ranges evolve over time as
partitions grow and slow down. It's nice because we're not trying to
predict where growth will be, we split where growth is.

>
> > p.s. Sorry I haven't been able to kick the tires just yet. We have a very
> > good use case for this, it's just a matter of getting a machine and the
> > time to devote to it.
>
> I would appreciate it. You could wait a little more for my next
> submission which will contain some revisions to the tuple routing code.
>
>
Ok, I'll wait a bit. In the mean time I can tell you a bit about the
existing production system I'm hoping to replicate in true partitioning
looks like this:

Big Master Table:
Range partition by C collated text
Date Range
Date Range
...
Range partition by C collated text
Date Range
Date Range
...
...

Currently this is accomplished through my range_partitioning module, and
then using pg_partman on those partitions. It works, but it's a lot of
moving parts.

The machine will be a 32 core AWS box. As per usual with AWS, it will be
have ample memory and CPU, and be somewhat starved for I/O.

Question: I haven't dove into the code, but I was curious about your tuple
routing algorithm. Is there any way for the algorithm to begin it's scan of
candidate partitions based on the destination of the last row inserted this
statement? I ask because most use cases (that I am aware of) have data that
would naturally cluster in the same partition.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-02-24 18:50:27 Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types
Previous Message Petr Jelinek 2016-02-24 17:35:16 Re: Proposal: Generic WAL logical messages