Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-11-17 11:27:13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016/11/16 4:21, Robert Haas wrote:
> On Tue, Nov 15, 2016 at 5:30 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2016/11/11 19:30, Amit Langote wrote:
>>> Attached updated patches.
> Have you done any performance testing on the tuple routing code?
> Suppose we insert a million (or 10 million) tuples into an
> unpartitioned table, a table with 10 partitions, a table with 100
> partitions, a table with 1000 partitions, and a table that is
> partitioned into 10 partitions each of which has 10 subpartitions.
> Ideally, the partitioned cases would run almost as fast as the
> unpartitioned case, but probably there will be some overhead.
> However, it would be useful to know how much. Also, it would be
> useful to set up the same cases with inheritance using a PL/pgsql ON
> INSERT trigger for tuple routing and compare. Hopefully the tuple
> routing code is far faster than a trigger, but we should make sure
> that's the case and look for optimizations if not. Also, it would be
> useful to know how much slower the tuple-mapping-required case is than
> the no-tuple-mapping-required case.

OK, I will share the performance results soon.

Meanwhile, here are updated patch that address most of the following comments.

> I think the comments in some of the later patches could use some work
> yet. For example, in 0007, FormPartitionKeyDatum()'s header comment
> is largely uninformative, get_partition_for_tuple()'s header comment
> doesn't explain what the return value means in the non-failure case,
> and ExecFindPartition() doesn't have a header comment at all.

Sorry, I have tried to make these comments more informative in the attached.

> The number of places in these patches where you end up reopening a
> hopefully-already-locked relation with NoLock (or sometimes with
> AccessShareLock) is worrying to me. I think that's a coding pattern
> we should be seeking to avoid; every one of those is not only a hazard
> (what if we reach that point in the code without a lock?) but a
> possible performance issue (we have to look up the OID in the
> backend-private hash table; and if you are passing AccessShareLock
> then you might also hit the lock manager which could be slow or create
> deadlock hazards). It would be much better to pass the Relation
> around rather than the OID whenever possible.

I have removed most instances of heap_open() calls after realizing they
were in fact unnecessary (and as you warn possibly hazardous and
performance hogs). Where it's inevitable that we perform heap_open(), I
have changed it so that the correct lockmode is passed.

> Also, in 0006:
> - I doubt that PartitionTreeNodeData's header comment will survive
> contact with pgindent.

Fixed by adding "/* ----" at the top of the comment.

> In 0007:
> - oid_is_foreign_table could/should do a syscache lookup instead of
> opening the heap relation. But actually I think you could drop it
> altogether and use get_rel_relkind().

Done with get_rel_relkind().

> - The XXX in EndCopy will need to be fixed somehow.

Fixed. Realized that doing that in EndCopy() is kind of pointless, I
moved that code to the end of CopyFrom(), right before a call to

> - I suspect that many of the pfree's in this patch are pointless
> because the contexts in which those allocations are performed will be
> reset or deleted shortly afterwards anyway. Only pfree data that
> might otherwise live substantially longer than we want, because pfree
> consumes some cycles.

They do seem to be freeing unnecessarily, removed.

> - The code in make_modifytable() to swap out the rte_array for a fake
> one looks like an unacceptable kludge. I don't know offhand what a
> better design would look like, but what you've got is really ugly.

Agree that it looks horrible. The problem is we don't add partition
(child table) RTEs when planning an insert on the parent and FDW
partitions can't do without some planner handling - planForeignModify()
expects a valid PlannerInfo for deparsing target lists (basically, to be
able to use planner_rt_fetch()).

Any design beside this kludge would perhaps mean that we will be adding
valid partition RTEs at some earlier planning stage much like what
expand_inherited_rtentry() does during non-insert queries. Perhaps, we
could have expand_inherited_rtentry() make an exception in partitioned
tables' case and create root->append_rel_list members even in the insert
case. We could then go over the append_rel_list in make_modifytable() to
get the valid RT index of foreign child tables to pass to
PlanForeignModify(). Using append_rel_list for insert planning is perhaps
equally ugly though. Thoughts?

> - I don't understand how it can be right for get_partition_for_tuple()
> to be emitting an error that says "range partition key contains null".
> A defective partition key should be detected at partition creation
> time, not in the middle of tuple routing.

That error is emitted when the partition key of the *input tuple* is found
to contain NULLs. Maybe we need to consider something like how btree
indexes treat NULLs - order NULLs either before or after all the non-NULL
values based on NULLS FIRST/LAST config. Thoughts?


Attachment Content-Type Size
0001-Catalog-and-DDL-for-partitioned-tables-15.patch text/x-diff 115.3 KB
0002-psql-and-pg_dump-support-for-partitioned-tables-15.patch text/x-diff 23.9 KB
0003-Catalog-and-DDL-for-partitions-15.patch text/x-diff 208.2 KB
0004-psql-and-pg_dump-support-for-partitions-15.patch text/x-diff 22.1 KB
0005-Teach-a-few-places-to-use-partition-check-quals-15.patch text/x-diff 30.9 KB
0006-Introduce-a-PartitionTreeNode-data-structure-15.patch text/x-diff 8.2 KB
0007-Tuple-routing-for-partitioned-tables-15.patch text/x-diff 43.4 KB
0008-Update-DDL-Partitioning-chapter-to-reflect-new-devel-15.patch text/x-diff 24.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2016-11-17 11:33:15 Re: Patch: Implement failover on libpq connect level.
Previous Message Ashutosh Bapat 2016-11-17 11:13:27 Re: pg_hba_file_settings view patch