Re: New Table Access Methods for Multi and Single Inserts

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2021-01-06 07:26:38
Message-ID: 508af801-6356-d36b-1867-011ac6df8f55@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05-01-2021 11:06, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <luc(at)swarm64(dot)com
> <mailto:luc(at)swarm64(dot)com>> wrote:
> >  > table AM patch [2] be reviewed further?
> > As to the patches themselves:
> >
> > I think the API is a huge step forward! I assume that we want to have a
> > single-insert API like heap_insert_v2 so that we can encode the
> > knowledge that there will just be a single insert coming and likely a
> > commit afterwards?
> >
> > Reason I'm asking is that I quite liked the heap_insert_begin parameter
> > is_multi, which could even be turned into a "expected_rowcount" of the
> > amount of rows expected to be commited in the transaction (e.g. single,
> > several, thousands/stream).
> > If we were to make the API based on expected rowcounts, the whole
> > heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
> > single function heap_insert, as the knowledge about buffering of the
> > slots is then already stored in the TableInsertState, creating an API
> like:
> >
> > // expectedRows: -1 = streaming, otherwise expected rowcount.
> > TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
> > options, int expectedRows);
> > heap_insert(TableInsertState *state, TupleTableSlot *slot);
> >
> > Do you think that's a good idea?
>
> IIUC, your suggestion is to use expectedRows and move the multi insert
> implementation heap_multi_insert_v2 to heap_insert_v2. If that's
> correct, so heap_insert_v2 will look something like this:
>
> heap_insert_v2()
> {
>     if (single_insert)
>       //do single insertion work, the code in existing heap_insert_v2
> comes here
>    else
>       //do multi insertion work, the code in existing
> heap_multi_insert_v2 comes here
> }
>
> I don't see any problem in combining single and multi insert APIs into
> one. Having said that, will the APIs be cleaner then? Isn't it going to
> be confusing if a single heap_insert_v2 API does both the works? With
> the existing separate APIs, for single insertion, the sequence of the
> API can be like begin, insert_v2, end and for multi inserts it's like
> begin, multi_insert_v2, flush, end. I prefer to have a separate multi
> insert API so that it will make the code look readable.
>
> Thoughts?

The main reason for me for wanting a single API is that I would like the
decision of using single or multi inserts to move to inside the tableam.
For e.g. a heap insert we might want to put the threshold at e.g. 100
rows so that the overhead of buffering the tuples is actually
compensated. For other tableam this logic might also be quite different,
and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
whether or not multi inserts should be used. Because otherwise the thing
we'll get is that there will be tableams that will ignore this flag and
do their own thing anyway. I'd rather have an API that gives all
necessary information to the tableam and then make the tableam do "the
right thing".

Another reason I'm suggesting this API is that I would expect that the
begin is called in a different place in the code for the (multiple)
inserts than the actual insert statement.
To me conceptually the begin and end are like e.g. the executor begin
and end: you prepare the inserts with the knowledge you have at that
point. I assumed (wrongly?) that during the start of the statement one
knows best how many rows are coming; and then the actual insertion of
the row doesn't have to deal anymore with multi/single inserts, choosing
when to buffer or not, because that information has already been given
during the initial phase. One of the reasons this is appealing to me is
that e.g. in [1] there was discussion on when to switch to a multi
insert state, and imo this should be up to the tableam.

>
> > Two smaller things I'm wondering:
> > - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
> > slots themselves are declared there?
>
> Firstly, we need to have the buffered slots sometimes(please have a look
> at the comments in TableInsertState structure) outside the multi_insert
> API. And we need to have cleared the previously flushed slots before we
> start buffering in heap_multi_insert_v2(). I can remove the
> clear_mi_slots flag altogether and do as follows: I will not set
> mistate->cur_slots to 0 in heap_multi_insert_flush after the flush, I
> will only set state->flushed to true. In heap_multi_insert_v2,
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
>     TupleTableSlot  *batchslot;
>     HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>     Size sz;
>
>     Assert(mistate && mistate->slots);
>
> *  /* if the slots are flushed previously then clear them off before
> using them again. */
>     if (state->flushed)
>     {
>         int i;
>
>         for (i = 0; i < mistate->cur_slots; i++)
>             ExecClearTuple(mistate->slots[i]);
>
>         mistate->cur_slots = 0;
>         state->flushed = false
>     }*
>
>     if (mistate->slots[mistate->cur_slots] == NULL)
>         mistate->slots[mistate->cur_slots] =
>                                     table_slot_create(state->rel, NULL);
>
>     batchslot = mistate->slots[mistate->cur_slots];
>
>     ExecCopySlot(batchslot, slot);
>
> Thoughts?

From what I can see you can just keep the v2-0001 patch and:
- remove the flushed variable alltogether. mistate->cur_slots == 0
encodes this already and the variable is never actually checked on.
- call ExecClearTuple just before ExecCopySlot()

Which would make the code something like:

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
TupleTableSlot *batchslot;
HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
Size sz;

Assert(mistate && mistate->slots);

if (mistate->slots[mistate->cur_slots] == NULL)
mistate->slots[mistate->cur_slots] =
table_slot_create(state->rel, NULL);

batchslot = mistate->slots[mistate->cur_slots];

ExecClearTuple(batchslot);
ExecCopySlot(batchslot, slot);

/*
* Calculate the tuple size after the original slot is copied, because the
* copied slot type and the tuple size may change.
*/
sz = GetTupleSize(batchslot, mistate->max_size);

Assert(sz > 0);

mistate->cur_slots++;
mistate->cur_size += sz;

if (mistate->cur_slots >= mistate->max_slots ||
mistate->cur_size >= mistate->max_size)
heap_multi_insert_flush(state);
}

void
heap_multi_insert_flush(TableInsertState *state)
{
HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
MemoryContext oldcontext;

Assert(mistate && mistate->slots && mistate->cur_slots >= 0 &&
mistate->context);

if (mistate->cur_slots == 0)
return;

oldcontext = MemoryContextSwitchTo(mistate->context);

heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
state->cid, state->options, state->bistate);

MemoryContextReset(mistate->context);
MemoryContextSwitchTo(oldcontext);

/*
* Do not clear the slots always. Sometimes callers may want the slots for
* index insertions or after row trigger executions in which case they have
* to clear the tuples before using for the next insert batch.
*/
if (state->clear_mi_slots)
{
int i;

for (i = 0; i < mistate->cur_slots; i++)
ExecClearTuple(mistate->slots[i]);
}

mistate->cur_slots = 0;
mistate->cur_size = 0;
}

>
> > Also, why do we want to do ExecClearTuple() anyway? Isn't
> > it good enough that the next call to ExecCopySlot will effectively clear
> > it out?
>
> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> slot before copying. But, for buffer heap slots, the
> tts_buffer_heap_copyslot does not always clear the destination slot, see
> below. If we fall into else condition, we might get some issues. And
> also note that, once the slot is cleared in ExecClearTuple, it will not
> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
> false. That is why, let's have ExecClearTuple as is.
>
I had no idea the buffer heap slot doesn't unconditionally clear out the
slot :( So yes lets call it unconditionally ourselves. See also
suggestion above.

>     /*
>      * If the source slot is of a different kind, or is a buffer slot
> that has
>      * been materialized / is virtual, make a new copy of the tuple.
> Otherwise
>      * make a new reference to the in-buffer tuple.
>      */
>     if (dstslot->tts_ops != srcslot->tts_ops ||
>         TTS_SHOULDFREE(srcslot) ||
>         !bsrcslot->base.tuple)
>     {
>         MemoryContext oldContext;
>
>         ExecClearTuple(dstslot);
>     }
>     else
>     {
>         Assert(BufferIsValid(bsrcslot->buffer));
>
>         tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>                                     bsrcslot->buffer, false);
>
> > - flushed -> why is this a stored boolean? isn't this indirectly encoded
> > by cur_slots/cur_size == 0?
>
> Note that cur_slots is in HeapMultiInsertState and outside of the new
> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
> really access the cur_slots. I mean, we can access but we need to be
> dereferencing using the tableam kind. Instead of  doing all of that, to
> keep the API cleaner, I chose to have a boolean in the TableInsertState
> which we can see and use outside of the new APIs. Hope that's fine.
>
So you mean the flushed variable is actually there to tell the user of
the API that they are supposed to call flush before end? Why can't the
end call flush itself then? I guess I completely misunderstood the
purpose of table_multi_insert_flush being public. I had assumed it is
there to from the usage site indicate that now would be a good time to
flush, e.g. because of a statement ending or something. I had not
understood this is a requirement that its always required to do
table_multi_insert_flush + table_insert_end.
IMHO I would hide this from the callee, given that you would only really
call flush yourself when you immediately after would call end, or are
there other cases where one would be required to explicitly call flush?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Kind regards,
Luc

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2021-01-06 07:31:23 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)
Previous Message Luc Vlaming 2021-01-06 07:00:55 Re: New Table Access Methods for Multi and Single Inserts