Re: Declarative partitioning - another take

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-11-15 19:21:59
Message-ID: CA+Tgmoarz2hkxoNddx9Ng_HMeoq4G8Q-R-teQtwmE8nCxkAJow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 15, 2016 at 5:30 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2016/11/11 19:30, Amit Langote wrote:
>>
>> Attached updated patches.
>
> Here is the latest version of the patches with some fixes along with those
> mentioned below (mostly in 0003):
>
> - Fixed the logic to skip the attach partition validation scan such that
> it won't skip scanning a list partition *that doesn't accept NULLs* if
> the partition key column is not set NOT NULL (it similarly doesn't skip
> scanning a range partition if either of the partition key columns is not
> set NOT NULL, because a range partition key cannot contain NULLs at all)
>
> - Added some more regression tests for ATTACH PARTITION command
>
> - Some fixes to documentation and source code comments

Have you done any performance testing on the tuple routing code?
Suppose we insert a million (or 10 million) tuples into an
unpartitioned table, a table with 10 partitions, a table with 100
partitions, a table with 1000 partitions, and a table that is
partitioned into 10 partitions each of which has 10 subpartitions.
Ideally, the partitioned cases would run almost as fast as the
unpartitioned case, but probably there will be some overhead.
However, it would be useful to know how much. Also, it would be
useful to set up the same cases with inheritance using a PL/pgsql ON
INSERT trigger for tuple routing and compare. Hopefully the tuple
routing code is far faster than a trigger, but we should make sure
that's the case and look for optimizations if not. Also, it would be
useful to know how much slower the tuple-mapping-required case is than
the no-tuple-mapping-required case.

I think the comments in some of the later patches could use some work
yet. For example, in 0007, FormPartitionKeyDatum()'s header comment
is largely uninformative, get_partition_for_tuple()'s header comment
doesn't explain what the return value means in the non-failure case,
and ExecFindPartition() doesn't have a header comment at all.

The number of places in these patches where you end up reopening a
hopefully-already-locked relation with NoLock (or sometimes with
AccessShareLock) is worrying to me. I think that's a coding pattern
we should be seeking to avoid; every one of those is not only a hazard
(what if we reach that point in the code without a lock?) but a
possible performance issue (we have to look up the OID in the
backend-private hash table; and if you are passing AccessShareLock
then you might also hit the lock manager which could be slow or create
deadlock hazards). It would be much better to pass the Relation
around rather than the OID whenever possible.

Also, in 0006:

- I doubt that PartitionTreeNodeData's header comment will survive
contact with pgindent.

In 0007:

- oid_is_foreign_table could/should do a syscache lookup instead of
opening the heap relation. But actually I think you could drop it
altogether and use get_rel_relkind().

- The XXX in EndCopy will need to be fixed somehow.

- I suspect that many of the pfree's in this patch are pointless
because the contexts in which those allocations are performed will be
reset or deleted shortly afterwards anyway. Only pfree data that
might otherwise live substantially longer than we want, because pfree
consumes some cycles.

- The code in make_modifytable() to swap out the rte_array for a fake
one looks like an unacceptable kludge. I don't know offhand what a
better design would look like, but what you've got is really ugly.

- I don't understand how it can be right for get_partition_for_tuple()
to be emitting an error that says "range partition key contains null".
A defective partition key should be detected at partition creation
time, not in the middle of tuple routing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-15 19:22:55 Re: Snapshot too old logging
Previous Message Kevin Grittner 2016-11-15 19:21:05 Re: Snapshot too old logging