Re: Declarative partitioning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Declarative partitioning
Date: 2016-01-25 10:32:53
Message-ID: 56A5F9D5.4040208@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

On 2016/01/23 3:42, Corey Huinker wrote:
>> So for now, you create an empty partitioned table specifying all the
>> partition keys without being able to define any partitions in the same
>> statement. Partitions (and partitions thereof, if any) will be created
>> using CREATE PARTITION statements, one for each.
>>
>
> ...and I would assume that any attempt to insert into a partitioned table
> with no partitions (or lacking partitions at a defined level) would be an
> error? If so, I'd be ok with that.

Yes. Basically, for a tuple insert to succeed, there should exist a leaf
partition that the inserted tuple would map to.

>> Specifying range partitioning bound as PARTITION FOR RANGE <range-literal>
>> sounds like it offers some flexibility, which can be seen as a good thing.
>> But it tends to make internal logic slightly complicated.
>>
>> Whereas, saying PARTITION FOR VALUES LESS THAN (max1, max2, ...) is
>> notationally simpler still and easier to work with internally. Also, there
>> will be no confusion about exclusivity of the bound if we document it so.
>
>
> I understand wanting the internal rules to be simple. Oracle clearly went
> with VALUES LESS THAN waterfalls for that reason.
>
> What I'm hoping to avoid is:
> - having to identify my "year2014" partition by VALUES LESS THAN
> '2015-01-01', a bit of cognitive dissonance defining data by what it's not.
> - and then hoping that there is a year2013 partition created by someone
> with similar sensibilities, the partition definition being incomplete
> outside of the context of other partition definitions.
> - and then further hoping that nobody drops the year2013 partition, thus
> causing new 2013 rows to fall into the year2014 partition, a side effect of
> an operation that did not mention the year2014 partition.

I somewhat agree with these concerns. So whenever a left range partition
is deleted, a partition would begin accepting tuples that would have been
mapped to the partition just deleted. Then as you say, partition would
contain data that does not really fit the name. A keen user could then
probably rename the partition to something suitable in the same
transaction. I'd imagine that a partition deleted in this way would be the
leftmost (oldest partition) in most scenarios and there wouldn't be any
future tuples that would map to such partition. Admittedly, that may just
be wishful thinking though.

On the other hand, giving user the ability to specify both min and max
requires a lot of bookkeeping on the system's part. One of my previous
patches had that syntax and I wasn't quite happy with overall flakiness of
the implementation. Granted it would have been slightly easier if I had
used range type machinery and I tried a little but didn't stick with it
for too long. You may have read the following thread:

http://www.postgresql.org/message-id/15387.1423496163@sss.pgh.pa.us

> Range types do that, and if we're concerned about range type overhead,
> we're only dealing with the ranges at DDL time, we can break down the ATR
> rules into a more easily digestible form once the partition is modified.

We still end up with min, max pair per partition along with the
inclusivity of both as being part of the persisted partition rule.

> Range continuity can be tested with -|-, but we'd only need to test for
> overlaps: gaps in ranges are sometimes a feature, not a bug (ex: I don't
> want any rows from future dates and we weren't in business before 1997).
>
> Also, VALUES LESS THAN forces us to use discrete values. There is no way
> with to express with VALUES LESS THAN partitions that have float values for
> temperature:
> ice (,0.0), water [0.0,212.0], steam (212.0,3000.0], plasma (3000.0,).
>
> Yes, I can calculate the day after the last day in a year, I can use
> 212.0000000001, I can write code to rigorously check that all partitions
> are in place. I'd just rather not.

Hmm, this may be a concern.

I have read discussion threads of range types development and one of the
contending issues was around semantics for continuous types.

So here, if all we have is VALUES LESS THAN to define a range partition,
one cannot assign ranges to partitions that cannot be reduced to [) form,
right?

> p.s. I'm really excited about what this will bring to Postgres in general
> and my organization in particular. This feature alone will help chip away
> at our needs for Vertica and Redshift clusters. Let me know if there's
> anything I can do to help.

I am really glad to hear that. Your comments so far are much appreciated.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2016-01-25 11:33:05 Re: easy way of copying regex_t
Previous Message Tomas Vondra 2016-01-25 10:07:17 Re: easy way of copying regex_t