Re: New Table Access Methods for Multi and Single Inserts

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2023-06-03 22:38:24
Message-ID: 20230603223824.o7iyochli2dwwi7k@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

This patch was referenced in a discussion at pgcon, so I thought I'd give it a
look, even though Bharat said that he won't have time to drive it forward...

On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
> index bd5faf0c1f..655de8e6b7 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
> .tuple_insert_speculative = heapam_tuple_insert_speculative,
> .tuple_complete_speculative = heapam_tuple_complete_speculative,
> .multi_insert = heap_multi_insert,
> + .tuple_insert_begin = heap_insert_begin,
> + .tuple_insert_v2 = heap_insert_v2,
> + .multi_insert_v2 = heap_multi_insert_v2,
> + .multi_insert_flush = heap_multi_insert_flush,
> + .tuple_insert_end = heap_insert_end,
> .tuple_delete = heapam_tuple_delete,
> .tuple_update = heapam_tuple_update,
> .tuple_lock = heapam_tuple_lock,

I don't think we should have multiple callback for the insertion APIs in
tableam.h. I think it'd be good to continue supporting the old table_*()
functions, but supporting multiple insert APIs in each AM doesn't make much
sense to me.

> +/*
> + * GetTupleSize - Compute the tuple size given a table slot.
> + *
> + * For heap tuple, buffer tuple and minimal tuple slot types return the actual
> + * tuple size that exists. For virtual tuple, the size is calculated as the
> + * slot does not have the tuple size. If the computed size exceeds the given
> + * maxsize for the virtual tuple, this function exits, not investing time in
> + * further unnecessary calculation.
> + *
> + * Important Notes:
> + * 1) Size calculation code for virtual slots is being used from
> + * tts_virtual_materialize(), hence ensure to have the same changes or fixes
> + * here and also there.
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
> + * virtual slots. Ensure to add related code in case any new slot type is
> + * introduced.
> + */
> +inline Size
> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
> +{
> + Size sz = 0;
> + HeapTuple tuple = NULL;
> +
> + if (TTS_IS_HEAPTUPLE(slot))
> + tuple = ((HeapTupleTableSlot *) slot)->tuple;
> + else if(TTS_IS_BUFFERTUPLE(slot))
> + tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
> + else if(TTS_IS_MINIMALTUPLE(slot))
> + tuple = ((MinimalTupleTableSlot *) slot)->tuple;
> + else if(TTS_IS_VIRTUAL(slot))

I think this embeds too much knowledge of the set of slot types in core
code. I don't see why it's needed either?

> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 414b6b4d57..2a1470a7b6 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
> TM_IndexStatus *status;
> } TM_IndexDeleteOp;
>
> +/* Holds table insert state. */
> +typedef struct TableInsertState

I suspect we should design it to be usable for updates and deletes in the
future, and thus name it TableModifyState.

> +{
> + Relation rel;
> + /* Bulk insert state if requested, otherwise NULL. */
> + struct BulkInsertStateData *bistate;
> + CommandId cid;

Hm - I'm not sure it's a good idea to force the cid to be the same for all
inserts done via one TableInsertState.

> + int options;
> + /* Below members are only used for multi inserts. */
> + /* Array of buffered slots. */
> + TupleTableSlot **mi_slots;
> + /* Number of slots that are currently buffered. */
> + int32 mi_cur_slots;

> + /*
> + * Access method specific information such as parameters that are needed
> + * for buffering and flushing decisions can go here.
> + */
> + void *mistate;

I think we should instead have a generic TableModifyState, which each AM then
embeds into an AM specific AM state. Forcing two very related structs to be
allocated separately doesn't seem wise in this case.

> @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> cid, options, bistate);
> }
>
> +static inline TableInsertState*
> +table_insert_begin(Relation rel, CommandId cid, int options,
> + bool alloc_bistate, bool is_multi)

Why have alloc_bistate and options?

> +static inline void
> +table_insert_end(TableInsertState *state)
> +{
> + /* Deallocate bulk insert state here, since it's AM independent. */
> + if (state->bistate)
> + FreeBulkInsertState(state->bistate);
> +
> + state->rel->rd_tableam->tuple_insert_end(state);
> +}

Seems like the order in here should be swapped?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-06-03 22:42:57 Re: Avoid unused value (src/fe_utils/print.c)
Previous Message Michael Paquier 2023-06-03 22:35:00 Re: [PATCH] Slight improvement of worker_spi.c example