Re: Declarative partitioning - another take

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-11-01 18:01:55
Message-ID: CA+TgmoZou4ApEvC_nfhOxsi5G4SoD_evwNaiYn60ZcJ4XB_-QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 1, 2016 at 1:49 PM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
> Exactly. This is especially true for date ranges. There's a lot of cognitive
> dissonance in defining the "2014" partition as < '2015-01-01', as was the
> case in Oracle waterfall-style partitioning. That was my reasoning for
> pushing for range-ish syntax as well as form.

Yeah. That syntax has some big advantages, though. If we define that
partition as START ('2014-01-01') INCLUSIVE END ('2014-12-31')
INCLUSIVE, there's no way for the system to tell that the there's no
gap between the that ending bound and the starting bound of the 2015
partition, because the system has no domain-specific knowledge that
there is no daylight between 2014-12-31 and 2015-01-01. So if we
allow things to be specified that way, then people will use that
syntax and then complain when it doesn't perform quite as well as
START ('2014-01-01') END ('2015-01-01'). Maybe the difference isn't
material and maybe we don't care; what do you think?

(I really don't want to get tied up adding a system for adding and
subtracting one to and from arbitrary data types. Life is too short.
If that requires that users cope with a bit of cognitive dissidence,
well, it's not the first time something like that will have happened.
I have some cognitive dissidence about the fact that creat(2) has no
trailing "e" but truncate(2) does, and moreover the latter can be used
to make a file longer rather than shorter. But, hey, that's what you
get for choosing a career in computer science.)

>> But who wants that? People who are doing prefix-based partitioning of
>> their text keys are going to want all of the 'a' things together, and
>> all of the 'b' things in another category. Same for ranges of
>> floating-point numbers, which are also probably an unlikely candidate
>> for a partitioning key anyway.
>
> /me raises hand. We have tables with a taxonomy in them where the even data
> splits don't fall on single letter boundaries, and often the single string
> values have more rows than entire letters. In those situations, being able
> to express ['XYZ','XYZ'] is important. ['XYZ,'XZ') would let 'XYZ1' bleed
> into the partition and ['XYZ','XYZ1') lets in other values, and so I go
> chasing down the non-discrete set rabbit hole.

Hmm. I have to admit that I hadn't considered the case where you have
a range partitioning scheme but one of the ranges includes only a
single string. If that's an important use case, that might be a fatal
problem with my proposal. :-(

> If we're worried about keywords, maybe a BOUNDED '[]' clause?

In the end, keywords are not the defining issue here; the issue is
whether all of this complexity around inclusive and exclusive bounds
carries its weight, and whether we want to be committed to that.

Any other opinions out there?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Francisco Olarte 2016-11-01 18:05:00 Re: Declarative partitioning - another take
Previous Message Robert Haas 2016-11-01 17:53:11 Re: Declarative partitioning - another take