Re: Multi Inserts in CREATE TABLE AS - revived patch

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Luc Vlaming <luc(at)swarm64(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Paul Guo <guopa(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Multi Inserts in CREATE TABLE AS - revived patch
Date: 2020-11-30 05:18:48
Message-ID: CALj2ACV8_O651C2zUqrVSRFDJkp8=TMwSdG9+mDGL+vF6CD+AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Currently, required logic for multi inserts (such as buffer slots
allocation, flushing, tuple size calculation to decide when to flush,
cleanup and so on) is being handled outside of the existing tableam APIs.
And there are a good number of cases where multi inserts can be used, such
as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW
[proposed in this thread], and INSERT INTO SELECTs [here
<https://www.postgresql.org/list/pgsql-hackers/since/202009240000/>] which
are currently under discussion. Handling the same multi inserts logic in
many places is error prone and duplicates most of the code. To avoid this,
proposing here are generic tableam APIs, that can be used in all the cases
and which also gives the flexibility to tableam developers in implementing
multi inserts logic dependent on the underlying storage engine[1].

I would like to seek thoughts/opinions on the proposed new APIs. Once
reviewed, I will start implementing them.

[1] -
https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com

Below are the proposed structures and APIs:

/* Holds the multi insert related information. */
typedef struct MultiInsertStateData
{
/* A temporary memory context for multi insert. */
MemoryContext micontext;
/* Bulk insert state. */
BulkInsertStateData *bistate;
/* Array of buffered slots. */
TupleTableSlot **mislots;
/* Maximum number of slots that can be buffered. */
int32 nslots;
/* Number of slots that are currently buffered. */
int32 nused;
/*
* Maximum total tuple size that can be buffered in
* a single batch. Flush the buffered tuples if the
* current total tuple size, nsize >= nbytes.
*/
int64 nbytes;
/*
* Total tuple size in bytes of the slots that are
* currently buffered.
*/
int64 nsize;
/*
* Whether to clear the buffered slots content
* after the flush? If the relation has indexes
* or after row triggers, the buffered slots
* required outside do_multi_insert() and clean
* them using ExecClearTuple() outside the
* do_multi_insert API. If true, do_multi_insert()
* can clear the slots.
*/
bool clearslots;
/*
* If true, do_multi_insert will flush the buffered
* slots, if any, bypassing the slot count and total
* tuple size checks. This can be useful in cases,
* where one of the partition can not use multi inserts
* but others can and they have buffered few slots
* so far, which need to be flushed for visibility,
* before the partition that doesn't support can
* proceed with single inserts.
*/
bool forceflush;
} MultiInsertStateData;

/*
* Allocates and initializes the MultiInsertStateData. Creates a temporary
* memory context for multi inserts, allocates BulkInsertStateData.
*/
void (*begin_multi_insert) (Relation rel,
MultiInsertStateData **mistate,
uint32 nslots,
uint64 nbytes);

/*
* Buffers the input slot into mistate slots. Computes the size of the
tuple,
* and adds it to the total size of the buffered tuples. If this size
crosses
* nbytes, flush the buffered tuples into the table. Clear the buffered
slots
* content if clearslots is true. If nbytes i.e. the maximum total tuple
size
* of the buffered tuples is not given, the tuple size is not calculated,
* tuples are buffered until all the nslots are filled and then flushed.
*
* For heapam, existing heap_multi_insert can be called using
* rel->rd_tableam->multi_insert() for flushing.
*/
void (*do_multi_insert) (Relation rel,
struct MultiInsertStateData *mistate,
struct TupleTableSlot *slot,
CommandId cid,
int options);

/*
* Flush the buffered tuples if any. Clear the buffered slots content if
* clearslots is true. Deletes temporary memory context and deallocates
* mistate.
*/
void (*end_multi_insert) (Relation rel,
struct MultiInsertStateData *mistate,
CommandId cid,
int options);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-11-30 05:35:46 Re: Printing backtrace of postgres processes
Previous Message Bharath Rupireddy 2020-11-30 05:13:05 Re: Parallel Inserts in CREATE TABLE AS