Re: Declarative partitioning vs. BulkInsertState

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: 高增琦 <pgf00a(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dmitry Ivanov <d(dot)ivanov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Declarative partitioning vs. BulkInsertState
Date: 2017-01-23 10:25:32
Message-ID: f25849aa-ce41-47d6-ed2b-a1d663d484c2@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/01/19 5:25, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote wrote:
>> On 2017/01/06 20:23, Amit Langote wrote:
>>>
>>> If a single BulkInsertState object is passed to
>>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>>> different partitions (from one input tuple to next), tuples might end up
>>> going into wrong heaps (like demonstrated in one of the reports [1]). A
>>> simple solution is to disable bulk-insert in case of partitioned tables.
>>>
>>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>>> conflated multi-insert stuff and bulk-insert considerations. I revised
>>> 0002 to not do that.
>>
>> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
>> Attaching updated 0002 along with rebased 0001 and 0003.
>
> The BulkInsertState is not there only to improve performance. It's
> also there to make sure we use a BufferAccessStrategy, so that we
> don't trash the whole buffer arena. See commit
> 85e2cedf985bfecaf43a18ca17433070f439fb0e. If a partitioned table uses
> a separate BulkInsertState for each partition, I believe it will also
> end up using a separate ring of buffers for every partition. That may
> well be faster than copying into an unpartitioned table in some cases,
> because dirtying everything in the buffer arena without actually
> writing any of those buffers is a lot faster than actually doing the
> writes. But it is also anti-social behavior; we have
> BufferAccessStrategy objects for a reason.

Thanks for the explanation. I agree that creating thousands of
BufferAccessStrategy objects would be a bad idea.

> One idea would be to have each partition use a separate
> BulkInsertState but have them point to the same underlying
> BufferAccessStrategy, but even that's problematic, because it could
> result in us holding a gigantic number of pins (one per partition). I
> think maybe a better idea would be to add an additional function
> ReleaseBulkInsertStatePin() which gets called whenever we switch
> relations, and then just use the same BulkInsertState throughout.

I tried implementing the second idea in the attached patch. It fixes the
bug (multiple reports as mentioned in the commit message) that tuples may
be inserted into the wrong partition.

Thanks,
Amit

Attachment Content-Type Size
0001-Make-partitioned-tables-play-nicely-with-bulk-insert.patch text/x-diff 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-01-23 10:26:01 Re: Checksums by default?
Previous Message Amit Kapila 2017-01-23 10:17:36 Re: pageinspect: Hash index support