Re: New Table Access Methods for Multi and Single Inserts

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2022-03-07 16:09:23
Message-ID: CAEze2Wi-CauoDfu9M2av4ntXdvoq2C6UdTnECTCL94Xpz7xi1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 6 Mar 2022 at 12:12, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > Another rebase due to conflicts in 0003. Attaching v6 for review.
> >
> > I recently touched the topic of multi_insert, and I remembered this
> > patch. I had to dig a bit to find it, but as it's still open I've
> > added some comments:
>
> Thanks for reviving the thread. I almost lost hope in it. In fact, it
> took me a while to recollect the work and respond to your comments.
> I'm now happy to answer or continue working on this patch if you or
> someone is really interested to review it and take it to the end.
>
> > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > > +#define MAX_BUFFERED_TUPLES 1000
> > > +#define MAX_BUFFERED_BYTES 65535
> >
> > It looks like these values were copied over from copyfrom.c, but are
> > now expressed in the context of the heapam.
> > As these values are now heap-specific (as opposed to the
> > TableAM-independent COPY infrastructure), shouldn't we instead
> > optimize for heap page insertions? That is, I suggest using a multiple
> > (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> > similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.
>
> We can do that. In fact, it is a good idea to let callers pass in as
> an input to tuple_insert_begin and have it as part of
> TableInsertState. If okay, I will do that in the next patch.
>
> > > TableInsertState->flushed
> > > TableInsertState->mi_slots
> >
> > I don't quite like the current storage-and-feedback mechanism for
> > flushed tuples. The current assumptions in this mechanism seem to be
> > that
> > 1.) access methods always want to flush all available tuples at once,
> > 2.) access methods want to maintain the TupleTableSlots for all
> > inserted tuples that have not yet had all triggers handled, and
> > 3.) we need access to the not-yet-flushed TupleTableSlots.
> >
> > I think that that is not a correct set of assumptions; I think that
> > only flushed tuples need to be visible to the tableam-agnostic code;
> > and that tableams should be allowed to choose which tuples to flush at
> > which point; as long as they're all flushed after a final
> > multi_insert_flush.
> >
> > Examples:
> > A heap-based access method might want bin-pack tuples and write out
> > full pages only; and thus keep some tuples in the buffers as they
> > didn't fill a page; thus having flushed only a subset of the current
> > buffered tuples.
> > A columnstore-based access method might not want to maintain the
> > TupleTableSlots of original tuples, but instead use temporary columnar
> > storage: TupleTableSlots are quite large when working with huge
> > amounts of tuples; and keeping lots of tuple data in memory is
> > expensive.
> >
> > As such, I think that this should be replaced with a
> > TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> > managed by the tableAM, in which only the flushed tuples are visible
> > to the AM-agnostic code. This is not much different from how the
> > implementation currently works; except that ->mi_slots now does not
> > expose unflushed tuples; and that ->flushed is replaced by an integer
> > value of number of flushed tuples.
>
> Yeah, that makes sense. Let's table AMs expose the flushed tuples
> outside on which the callers can handle the after-insert row triggers
> upon them.
>
> IIUC, TableInsertState needs to have few other variables:
>
> /* Below members are only used for multi inserts. */
> /* Array of buffered slots. */
> TupleTableSlot **mi_slots;

Not quite: there's no need for TupleTableSlot **mi_slots in the
TableInsertState; as the buffer used by the tableAM to buffer
unflushed tuples shouldn't be publicly visible. I suspect that moving
that field to HeapMultiInsertState instead would be the prudent thing
to do; limiting the external access of AM-specific buffers.

> /* Number of slots that are currently buffered. */
> int32 mi_cur_slots;
> /* Array of flushed slots that will be used by callers to handle
> after-insert row triggers or similar events outside */
> TupleTableSlot **mi_flushed_slots ;
> /* Number of slots that are currently buffered. */
> int32 mi_no_of_flushed_slots;
>
> The implementation of heap_multi_insert_flush will just set the
> mi_slots to mi_flushed_slots.

Yes.

> > A further improvement (in my opinion) would be the change from a
> > single multi_insert_flush() to a signalling-based multi_insert_flush:
> > It is not unreasonable for e.g. a columnstore to buffer tens of
> > thousands of inserts; but doing so in TupleTableSlots would introduce
> > a high memory usage. Allowing for batched retrieval of flushed tuples
> > would help in memory usage; which is why multiple calls to
> > multi_insert_flush() could be useful. To handle this gracefully, we'd
> > probably add TIS->mi_flush_remaining, where each insert adds one to
> > mi_flush_remaining; and each time mi_flushed_slots has been handled
> > mi_flush_remaining is decreased by mi_flushed_len by the handler code.
> > Once we're done inserting into the table, we keep calling
> > multi_insert_flush until no more tuples are being flushed (and error
> > out if we're still waiting for flushes but no new flushed tuples are
> > returned).
>
> The current approach is signalling-based right?
> heap_multi_insert_v2
> if (state->mi_cur_slots >= mistate->max_slots ||
> mistate->cur_size >= mistate->max_size)
> heap_multi_insert_flush(state);

That's for the AM-internal flushing; yes. I was thinking about the AM
api for flushing that's used when finalizing the batched insert; i.e.
table_multi_insert_flush.

Currently it assumes that all buffered tuples will be flushed after
one call (which is correct for heap), but putting those unflushed
tuples all at once back in memory might not be desirable or possible
(for e.g. columnar); so we might need to call table_multi_insert_flush
until there's no more buffered tuples.

> The table_multi_insert_v2 am implementers will have to carefully
> choose buffering strategy i.e. number of tuples, size to buffer and
> decide rightly without hitting memory usages.

Agreed

-Matthias

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-03-07 16:27:03 Re: Handle infinite recursion in logical replication setup
Previous Message Daniel Westermann (DWE) 2022-03-07 16:06:22 Re: Changing "Hot Standby" to "hot standby"