Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: 高增琦 <pgf00a(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Declarative partitioning - another take
Date: 2017-01-06 11:23:15
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017/01/05 3:26, Robert Haas wrote:
> On Tue, Dec 27, 2016 at 8:41 PM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2016/12/27 19:07, Amit Langote wrote:
>>> Attached should fix that.
>> Here are the last two patches with additional information like other
>> patches. Forgot to do that yesterday.
> 0001 has the disadvantage that get_partition_for_tuple() acquires a
> side effect. That seems undesirable. At the least, it needs to be
> documented in the function's header comment.

That's true. How about we save away the original ecxt_scantuple at entry
and restore the same just before returning from the function? That way
there would be no side effect. 0001 implements that.

> It's unclear to me why we need to do 0002. It doesn't seem like it
> should be necessary, it doesn't seem like a good idea, and the commit
> message you proposed is uninformative.

If a single BulkInsertState object is passed to
heap_insert()/heap_multi_insert() for different heaps corresponding to
different partitions (from one input tuple to next), tuples might end up
going into wrong heaps (like demonstrated in one of the reports [1]). A
simple solution is to disable bulk-insert in case of partitioned tables.

But my patch (or its motivations) was slightly wrongheaded, wherein I
conflated multi-insert stuff and bulk-insert considerations. I revised
0002 to not do that.

However if we disable bulk-insert mode, COPY's purported performance
benefit compared with INSERT is naught. Patch 0003 is a proposal to
implement bulk-insert mode even for partitioned tables. Basically,
allocate separate BulkInsertState objects for each partition and switch to
the appropriate one just before calling heap_insert()/heap_multi_insert().
Then to be able to use heap_multi_insert(), we must also manage buffered
tuples separately for each partition. Although, I didn't modify the limit
on number of buffered tuples and/or size of buffered tuples which controls
when we pause buffering and do heap_multi_insert() on buffered tuples.
Maybe, it should work slightly differently for the partitioned table case,
like for example, increase the overall limit on both the number of tuples
and tuple size in the partitioning case (I observed that increasing it 10x
or 100x helped to some degree). Thoughts on this?



Attachment Content-Type Size
0001-Set-ecxt_scantuple-correctly-for-tuple-routing.patch text/x-diff 4.5 KB
0002-No-multi-insert-and-bulk-insert-when-COPYing-into-pa.patch text/x-diff 2.6 KB
0003-Support-bulk-insert-mode-for-partitioned-tables.patch text/x-diff 11.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-01-06 11:28:59 Re: UNDO and in-place update
Previous Message Simon Riggs 2017-01-06 10:40:03 Re: logical decoding of two-phase transactions