Re: New Table Access Methods for Multi and Single Inserts

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2021-01-12 08:03:33
Message-ID: c2ba9c52-8c5a-13f0-7d50-b569f7c93840@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06-01-2021 14:06, Bharath Rupireddy wrote:
> On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
>> The main reason for me for wanting a single API is that I would like the
>> decision of using single or multi inserts to move to inside the tableam.
>> For e.g. a heap insert we might want to put the threshold at e.g. 100
>> rows so that the overhead of buffering the tuples is actually
>> compensated. For other tableam this logic might also be quite different,
>> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
>> whether or not multi inserts should be used. Because otherwise the thing
>> we'll get is that there will be tableams that will ignore this flag and
>> do their own thing anyway. I'd rather have an API that gives all
>> necessary information to the tableam and then make the tableam do "the
>> right thing".
>>
>> Another reason I'm suggesting this API is that I would expect that the
>> begin is called in a different place in the code for the (multiple)
>> inserts than the actual insert statement.
>> To me conceptually the begin and end are like e.g. the executor begin
>> and end: you prepare the inserts with the knowledge you have at that
>> point. I assumed (wrongly?) that during the start of the statement one
>> knows best how many rows are coming; and then the actual insertion of
>> the row doesn't have to deal anymore with multi/single inserts, choosing
>> when to buffer or not, because that information has already been given
>> during the initial phase. One of the reasons this is appealing to me is
>> that e.g. in [1] there was discussion on when to switch to a multi
>> insert state, and imo this should be up to the tableam.
>
> Agree that whether to go with the multi or single inserts should be
> completely left to tableam implementation, we, as callers of those API
> just need to inform whether we expect single or multiple rows, and it
> should be left to tableam implementation whether to actually go with
> buffering or single inserts. ISTM that it's an elegant way of making
> the API generic and abstracting everything from the callers. What I
> wonder is how can we know in advance the expected row count that we
> need to pass in to heap_insert_begin()? IIUC, we can not estimate the
> upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
> other insert queries? Of course, we can look at the planner's
> estimated row count for the selects in COPY, Insert Into Select or
> Refresh Mat View after the planning, but to me that's not something we
> can depend on and pass in the row count to the insert APIs.
>
> When we don't know the expected row count, why can't we(as callers of
> the APIs) tell the APIs something like, "I'm intending to perform
> multi inserts, so if possible and if you have a mechanism to buffer
> the slots, do it, otherwise insert the tuples one by one, or else do
> whatever you want to do with the tuples I give it you". So, in case of
> COPY we can ask the API for multi inserts and call heap_insert_begin()
> and heap_insert_v2().
>

I thought that when it is available (because of planning) it would be
nice to pass it in. If you don't know you could pass in a 1 for doing
single inserts, and e.g. -1 or max-int for streaming. The reason I
proposed it is so that tableam's have as much knowledge as posisble to
do the right thing. is_multi does also work of course but is just
somewhat less informative.

What to me seemed somewhat counterintuitive is that with the proposed
API it is possible to say is_multi=true and then still call
heap_insert_v2 to do a single insert.

> Given the above explanation, I still feel bool is_multi would suffice.
>
> Thoughts?
>
> On dynamically, switching from single to multi inserts, this can be
> done by heap_insert_v2 itself. The way I think it's possible is that,
> say we have some threshold row count 1000(can be a macro) after
> inserting those many tuples, heap_insert_v2 can switch to buffering
> mode.

For that I thought it'd be good to use the expected row count, but yeah
dynamically switching also works and might work better if the expected
row counts are usually off.

>
> Thoughts?
>
>> Which would make the code something like:
>>
>> void
>> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
>> {
>> TupleTableSlot *batchslot;
>> HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>> Size sz;
>>
>> Assert(mistate && mistate->slots);
>>
>> if (mistate->slots[mistate->cur_slots] == NULL)
>> mistate->slots[mistate->cur_slots] =
>> table_slot_create(state->rel, NULL);
>>
>> batchslot = mistate->slots[mistate->cur_slots];
>>
>> ExecClearTuple(batchslot);
>> ExecCopySlot(batchslot, slot);
>>
>> /*
>> * Calculate the tuple size after the original slot is copied, because the
>> * copied slot type and the tuple size may change.
>> */
>> sz = GetTupleSize(batchslot, mistate->max_size);
>>
>> Assert(sz > 0);
>>
>> mistate->cur_slots++;
>> mistate->cur_size += sz;
>>
>> if (mistate->cur_slots >= mistate->max_slots ||
>> mistate->cur_size >= mistate->max_size)
>> heap_multi_insert_flush(state);
>> }
>
> I think clearing tuples before copying the slot as you suggested may
> work without the need of clear_slots flag.

ok, cool :)

>
>>
>>> > Also, why do we want to do ExecClearTuple() anyway? Isn't
>>> > it good enough that the next call to ExecCopySlot will effectively clear
>>> > it out?
>>>
>>> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
>>> slot before copying. But, for buffer heap slots, the
>>> tts_buffer_heap_copyslot does not always clear the destination slot, see
>>> below. If we fall into else condition, we might get some issues. And
>>> also note that, once the slot is cleared in ExecClearTuple, it will not
>>> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
>>> false. That is why, let's have ExecClearTuple as is.
>>>
>> I had no idea the buffer heap slot doesn't unconditionally clear out the
>> slot :( So yes lets call it unconditionally ourselves. See also
>> suggestion above.
>
> Yeah, we will clear the tuple slot before copy to be on the safer side.
>

ok

>>> /*
>>> * If the source slot is of a different kind, or is a buffer slot
>>> that has
>>> * been materialized / is virtual, make a new copy of the tuple.
>>> Otherwise
>>> * make a new reference to the in-buffer tuple.
>>> */
>>> if (dstslot->tts_ops != srcslot->tts_ops ||
>>> TTS_SHOULDFREE(srcslot) ||
>>> !bsrcslot->base.tuple)
>>> {
>>> MemoryContext oldContext;
>>>
>>> ExecClearTuple(dstslot);
>>> }
>>> else
>>> {
>>> Assert(BufferIsValid(bsrcslot->buffer));
>>>
>>> tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>>> bsrcslot->buffer, false);
>>>
>>> > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>>> > by cur_slots/cur_size == 0?
>>>
>>> Note that cur_slots is in HeapMultiInsertState and outside of the new
>>> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
>>> really access the cur_slots. I mean, we can access but we need to be
>>> dereferencing using the tableam kind. Instead of doing all of that, to
>>> keep the API cleaner, I chose to have a boolean in the TableInsertState
>>> which we can see and use outside of the new APIs. Hope that's fine.
>>>
>> So you mean the flushed variable is actually there to tell the user of
>> the API that they are supposed to call flush before end? Why can't the
>> end call flush itself then? I guess I completely misunderstood the
>> purpose of table_multi_insert_flush being public. I had assumed it is
>> there to from the usage site indicate that now would be a good time to
>> flush, e.g. because of a statement ending or something. I had not
>> understood this is a requirement that its always required to do
>> table_multi_insert_flush + table_insert_end.
>> IMHO I would hide this from the callee, given that you would only really
>> call flush yourself when you immediately after would call end, or are
>> there other cases where one would be required to explicitly call flush?
>
> We need to know outside the multi_insert API whether the buffered
> slots in case of multi inserts are flushed. Reason is that if we have
> indexes or after row triggers, currently we call ExecInsertIndexTuples
> or ExecARInsertTriggers on the buffered slots outside the API in a
> loop after the flush.
>
> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers. And in
>
> If we don't want to go with callback way, then at least we need to
> know whether or not heap_insert_v2 has chosen multi inserts, if yes,
> the buffered slots array, and the number of current buffered slots,
> whether they are flushed or not in the TableInsertState. Then,
> eventually, we might need all the HeapMultiInsertState info in the
> TableInsertState.
>

To me the callback API seems cleaner, that on heap_insert_begin we can
pass in a callback that is called on every flushed slot, or only on
multi-insert flushes. Is there a reason it would only be done for
multi-insert flushes or can it be generic?

> Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Hi,

Replied inline.

Kind regards,
Luc

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-12 08:11:37 Re: adding partitioned tables to publications
Previous Message Michael Paquier 2021-01-12 07:27:40 Re: O(n^2) system calls in RemoveOldXlogFiles()