Re: New Table Access Methods for Multi and Single Inserts

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2022-03-06 11:11:54
Message-ID: CALj2ACXSRTo4O_YzTg2cnsocnkwN+hyQH4HAm3KH2Z83NofWKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > Another rebase due to conflicts in 0003. Attaching v6 for review.
>
> I recently touched the topic of multi_insert, and I remembered this
> patch. I had to dig a bit to find it, but as it's still open I've
> added some comments:

Thanks for reviving the thread. I almost lost hope in it. In fact, it
took me a while to recollect the work and respond to your comments.
I'm now happy to answer or continue working on this patch if you or
someone is really interested to review it and take it to the end.

> > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > +#define MAX_BUFFERED_TUPLES 1000
> > +#define MAX_BUFFERED_BYTES 65535
>
> It looks like these values were copied over from copyfrom.c, but are
> now expressed in the context of the heapam.
> As these values are now heap-specific (as opposed to the
> TableAM-independent COPY infrastructure), shouldn't we instead
> optimize for heap page insertions? That is, I suggest using a multiple
> (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.

We can do that. In fact, it is a good idea to let callers pass in as
an input to tuple_insert_begin and have it as part of
TableInsertState. If okay, I will do that in the next patch.

> > TableInsertState->flushed
> > TableInsertState->mi_slots
>
> I don't quite like the current storage-and-feedback mechanism for
> flushed tuples. The current assumptions in this mechanism seem to be
> that
> 1.) access methods always want to flush all available tuples at once,
> 2.) access methods want to maintain the TupleTableSlots for all
> inserted tuples that have not yet had all triggers handled, and
> 3.) we need access to the not-yet-flushed TupleTableSlots.
>
> I think that that is not a correct set of assumptions; I think that
> only flushed tuples need to be visible to the tableam-agnostic code;
> and that tableams should be allowed to choose which tuples to flush at
> which point; as long as they're all flushed after a final
> multi_insert_flush.
>
> Examples:
> A heap-based access method might want bin-pack tuples and write out
> full pages only; and thus keep some tuples in the buffers as they
> didn't fill a page; thus having flushed only a subset of the current
> buffered tuples.
> A columnstore-based access method might not want to maintain the
> TupleTableSlots of original tuples, but instead use temporary columnar
> storage: TupleTableSlots are quite large when working with huge
> amounts of tuples; and keeping lots of tuple data in memory is
> expensive.
>
> As such, I think that this should be replaced with a
> TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> managed by the tableAM, in which only the flushed tuples are visible
> to the AM-agnostic code. This is not much different from how the
> implementation currently works; except that ->mi_slots now does not
> expose unflushed tuples; and that ->flushed is replaced by an integer
> value of number of flushed tuples.

Yeah, that makes sense. Let's table AMs expose the flushed tuples
outside on which the callers can handle the after-insert row triggers
upon them.

IIUC, TableInsertState needs to have few other variables:

/* Below members are only used for multi inserts. */
/* Array of buffered slots. */
TupleTableSlot **mi_slots;
/* Number of slots that are currently buffered. */
int32 mi_cur_slots;
/* Array of flushed slots that will be used by callers to handle
after-insert row triggers or similar events outside */
TupleTableSlot **mi_flushed_slots ;
/* Number of slots that are currently buffered. */
int32 mi_no_of_flushed_slots;

The implementation of heap_multi_insert_flush will just set the
mi_slots to mi_flushed_slots.

> A further improvement (in my opinion) would be the change from a
> single multi_insert_flush() to a signalling-based multi_insert_flush:
> It is not unreasonable for e.g. a columnstore to buffer tens of
> thousands of inserts; but doing so in TupleTableSlots would introduce
> a high memory usage. Allowing for batched retrieval of flushed tuples
> would help in memory usage; which is why multiple calls to
> multi_insert_flush() could be useful. To handle this gracefully, we'd
> probably add TIS->mi_flush_remaining, where each insert adds one to
> mi_flush_remaining; and each time mi_flushed_slots has been handled
> mi_flush_remaining is decreased by mi_flushed_len by the handler code.
> Once we're done inserting into the table, we keep calling
> multi_insert_flush until no more tuples are being flushed (and error
> out if we're still waiting for flushes but no new flushed tuples are
> returned).

The current approach is signalling-based right?
heap_multi_insert_v2
if (state->mi_cur_slots >= mistate->max_slots ||
mistate->cur_size >= mistate->max_size)
heap_multi_insert_flush(state);

The table_multi_insert_v2 am implementers will have to carefully
choose buffering strategy i.e. number of tuples, size to buffer and
decide rightly without hitting memory usages.

Thoughts?

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-06 12:46:04 Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Previous Message Gilles Darold 2022-03-06 08:39:37 Re: [Proposal] vacuumdb --schema only