Quick Links

Re: Declarative partitioning

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Declarative partitioning
Date:	2016-03-08 01:39:23
Message-ID:	56DE2D4B.1040001@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Corey,

Sorry for replying so late.

On 2016/02/25 3:31, Corey Huinker wrote:
>

[ ... ]

> So I would assume that we'd use a syntax that presumed the columns were in
> a composite range type.
>
> Which means your creates would look like (following Robert Haas's implied
> suggestion that we leave off the string literal quotes):
>
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES ( , (b,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );
>
> That's not terrible looking.

So I tried a grammar that looked like the following:

range_spec: lb_inc bound ',' bound ub_inc {<create-range-struct>}

lb_inc: '[' { $$ = true; } | '(' { $$ = false; }
ub_inc: ']' { $$ = true; } | ')' { $$ = false; }

bound: a_expr
{
if (IsA($1, RowExpr)) /* (key1, key2, ...) */
$$ = ((RowExpr) $1)->args;
else /* key */
$$ = list_make1($1);
}
| /* EMPTY */ { $$ = NIL; }

Everything seemed to go dandy until I tried FOR VALUES (blah , blah],
where psql wouldn't send the command string without accepting the closing
parenthesis, :(. So maybe I should try to put the whole thing in '', that
is, accept the full range_spec in a string, but then we are back to
requiring full-blown range parse function which I was trying to avoid by
using the aforementioned grammar. So, I decided to move ahead with the
following grammar for time being:

START (lower-bound) [ EXCLUSIVE ]
| END (upper-bound) [ INCLUSIVE ]
| START (lower-bound) [ EXCLUSIVE ] END (upper-bound) [ INCLUSIVE ]

Where,

*-bound: a_expr
| *-bound ',' a_expr

Note that in the absence of explicit specification, lower-bound is
inclusive and upper-bound is exclusive.

So the verbosity couldn't be helped unless accepting range literal in
string form and exporting rangetypes.c:range_parse() with range
partitioning-specific hacks (consider composite bounds) to parse it are
acceptable things.

>> IOW, one shouldn't create an unbounded partition if more partitions in the
>> unbounded direction are expected to be created. It would be OK for
>> unbounded partitions to be on the lower end most of the times.
>>
>
> On this I'll have to disagree. My own use case where I use my
> range_partitioning extension starts off with a single partition () and all
> new partitions are splits of that. The ranges evolve over time as
> partitions grow and slow down. It's nice because we're not trying to
> predict where growth will be, we split where growth is.

Okay, perhaps I should not presume a certain usage. However, as you know,
the usage like yours requires some mechanism of data redistribution (also
not without some syntax), which I am not targeting with the initial patch.
If that was the only way of creating partitions, matters would be a
little easier - you only specify a split point and have some rule about
inclusivity around the split point. But we have to start with the usage
where each new partition is separately created with explicit partition
bound specification that is complete in itself and that's where the logic
to check partition invariants may get a bit complicated.

> Ok, I'll wait a bit. In the mean time I can tell you a bit about the
> existing production system I'm hoping to replicate in true partitioning
> looks like this:
>
> Big Master Table:
> Range partition by C collated text
> Date Range
> Date Range
> ...
> Range partition by C collated text
> Date Range
> Date Range
> ...
> ...
>
> Currently this is accomplished through my range_partitioning module, and
> then using pg_partman on those partitions. It works, but it's a lot of
> moving parts.
>
> The machine will be a 32 core AWS box. As per usual with AWS, it will be
> have ample memory and CPU, and be somewhat starved for I/O.
>
> Question: I haven't dove into the code, but I was curious about your tuple
> routing algorithm. Is there any way for the algorithm to begin it's scan of
> candidate partitions based on the destination of the last row inserted this
> statement? I ask because most use cases (that I am aware of) have data that
> would naturally cluster in the same partition.

No. Actually the tuple-routing function starts afresh for each row. For
range partitions, it's binary search over an array of upper bounds. There
is no row-to-row state caching in the partition module itself.

Thanks,
Amit

In response to

Re: Declarative partitioning at 2016-02-24 18:31:20 from Corey Huinker

Responses

Re: Declarative partitioning at 2016-03-09 00:17:36 from Corey Huinker
Re: Declarative partitioning at 2016-03-09 14:19:20 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2016-03-08 01:39:30	Allowing to run a buildfarm animal under valgrind
Previous Message	Peter Geoghegan	2016-03-08 01:09:21	Re: Freeze avoidance of very large table.