Re: Declarative partitioning - another take

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Declarative partitioning - another take
Date: 2016-12-14 14:54:11
Message-ID: e24d6ad7-20af-27e4-4234-ab6df6692177@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

On 12/13/2016 09:45 AM, Amit Langote wrote:
> On 2016/12/13 0:17, Tomas Vondra wrote:
>> On 12/12/2016 07:37 AM, Amit Langote wrote:
>>>
>>> Hi Tomas,
>>>
>>> On 2016/12/12 10:02, Tomas Vondra wrote:
>>>>
>>>> 2) I'm wondering whether having 'table' in the catalog name (and also in
>>>> the new relkind) is too limiting. I assume we'll have partitioned indexes
>>>> one day, for example - do we expect to use the same catalogs?
>>>
>>> I am not sure I understand your idea of partitioned indexes, but I doubt
>>> it would require entries in the catalog under consideration. Could you
>>> perhaps elaborate more?
>>>
>>
>> OK, let me elaborate. Let's say we have a partitioned table, and I want to
>> create an index. The index may be either "global" i.e. creating a single
>> relation for data from all the partitions, or "local" (i.e. partitioned
>> the same way as the table).
>>
>> Local indexes are easier to implement (it's essentially what we have now,
>> except that we need to create the indexes manually for each partition),
>> and don't work particularly well for some use cases (e.g. unique
>> constraints). This is what I mean by "partitioned indexes".
>>
>> If the index is partitioned just like the table, we probably won't need to
>> copy the partition key info (so, nothing in pg_partitioned_table).
>> I'm not sure it makes sense to partition the index differently than the
>> table - I don't see a case where that would be useful.
>>
>> The global indexes would work better for the unique constraint use case,
>> but it clearly contradicts our idea of TID (no information about which
>> partition that references).
>>
>> So maybe the catalog really only needs to track info about tables? Not
>> sure. I'm just saying it'd be unfortunate to have _table in the name, and
>> end up using it for indexes too.
>
> Hmm, I didn't quite think of the case where the index is partitioned
> differently from the table, but perhaps that's possible with some other
> databases.
>

I haven't thought about that very deeply either, so perhaps it's an
entirely silly idea. Also, probably quite complex to implement I guess,
so unlikely to be pursued soon.

> What you describe as "local indexes" or "locally partitioned indexes" is
> something I would like to see being pursued in the near term. In that
> case, we would allow defining indexes on the parent that are recursively
> defined on the partitions and marked as inherited index, just like we have
> inherited check constraints and NOT NULL constraints. I have not studied
> whether we could implement (globally) *unique* indexes with this scheme
> though, wherein the index key is a superset of the partition key.
>

I think implementing UNIQUE constraint with local indexes is possible
and possibly even fairly simple, but it likely requires SHARE lock on
all partitions, which is not particularly nice.

When the partition key is referenced in the constraint, that may allow
locking only a subset of the partitions, possibly even a single one. But
with multi-level partitioning schemes that may be difficult.

Also, I don't think it's very likely to have the partitioning key as
part of the unique constraint. For example 'users' table is unlikely to
be distributed by 'login' and so on.

The global indexes make this easier, because there's just a single
index to check. But of course, attaching/detaching partitions gets more
expensive.

Anyway, starting a detailed discussion about local/global indexes was
not really what I meant to do.

>> Clearly, this is a consequence of building the partitioning on top of
>> inheritance (not objecting to that approach, merely stating a fact).
>>
>> I'm fine with whatever makes the error messages more consistent, if it
>> does not make the code significantly more complex. It's a bit confusing
>> when some use 'child tables' and others 'partitions'. I suspect even a
>> single DML command may return a mix of those, depending on where exactly
>> it fails (old vs. new code).
>
> So, we have mostly some old DDL (CREATE/ALTER TABLE) and maintenance
> commands that understand inheritance. All of the their error messages
> apply to partitions as well, wherein they will be referred to as "child
> tables" using old terms. We now have some cases where the commands cause
> additional error messages for only partitions because of additional
> restrictions that apply to them. We use "partitions" for them because
> they are essentially new error messages.
>
> There won't be a case where single DML command would mix the two terms,
> because we do not allow mixing partitioning and regular inheritance.
> Maybe I misunderstood you though.
>

Don't we call inheritance-related functions from the new DDL? In that
case we'd fail with 'child tables' error messages in the old code, and
'partitions' in the new code. I'd be surprised if there was no such code
reuse, but I haven't checked.

>> Am I right that one of the ambitions of the new partitioning is to improve
>> behavior with large number of partitions?
>
> Yes. Currently, SELECT planning is O(n) with significantly large constant
> factor. It is possible now to make it O(log n). Also, if we can do away
> with inheritance_planner() treatment for the *partitioned tables* in case
> of UPDATE/DELETE, then that would be great. That would mean their
> planning time would be almost same as the SELECT case.
>
> As you might know, we have volunteers to make this happen sooner [1], :)
>

Yes, I know. And it's great that you've managed to make the first step,
getting all the infrastructure in, allowing others to build on that.
Kudos to you!

>> At first I thought it's somewhat related to the FDW sharding (each node
>> being a partition and having local subpartitions), but I realize the
>> planner will only deal with the node partitions I guess.
>
> Yeah, planner would only have the local partitioning metadata at its
> disposal. Foreign tables can only be leaf partitions, which if need to be
> scanned for a given query, will be scanned using a ForeignScan.
>

Right, makes sense. Still, I can imagine for example having many daily
partitions and not having to merge them regularly just to reduce the
number of partitions.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-12-14 14:57:23 Re: pg_authid.rolpassword format (was Re: Password identifiers, protocol aging and SCRAM protocol)
Previous Message Peter Eisentraut 2016-12-14 14:52:48 Re: pg_authid.rolpassword format (was Re: Password identifiers, protocol aging and SCRAM protocol)