Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-04-29 06:06:20
Message-ID: CALj2ACWTrx1zxWvq8Uj2rEwCsDgQHeJ53WdvzZUw3kW+_VPG6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 25, 2024 at 10:11 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Wed, 2024-04-24 at 18:19 +0530, Bharath Rupireddy wrote:
> > I added a flush callback named TableModifyBufferFlushCallback; when
> > provided by callers invoked after tuples are flushed to disk from the
> > buffers but before the AM frees them up. Index insertions and AFTER
> > ROW INSERT triggers can be executed in this callback. See the v19-
> > 0001 patch for how AM invokes the flush callback, and see either v19-
> > 0003 or v19-0004 or v19-0005 for how a caller can supply the callback
> > and required context to execute index insertions and AR triggers.
>
> The flush callback takes a pointer to an array of slot pointers, and I
> don't think that's the right API. I think the callback should be called
> on each slot individually.
>
> We shouldn't assume that a table AM stores buffered inserts as an array
> of slot pointers. A TupleTableSlot has a fair amount of memory overhead
> (64 bytes), so most AMs wouldn't want to pay that overhead for every
> tuple. COPY does, but that's because the number of buffered tuples is
> fairly small.

I get your point. An AM can choose to implement the buffering strategy
by just storing the plain tuple rather than the tuple slots in which
case the flush callback with an array of tuple slots won't work.
Therefore, I now changed the flush callback to accept only a single
tuple slot.

> > > 11. Deprecate the multi_insert API.
> >
> > I did remove both table_multi_insert and table_finish_bulk_insert in
> > v19-0006.
>
> That's OK with me. Let's leave those functions out for now.

Okay. Dropped the 0006 patch for now.

Please see the attached v20 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v20-0001-Introduce-new-Table-Access-Methods-for-single-an.patch application/x-patch 20.5 KB
v20-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch application/x-patch 7.0 KB
v20-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch application/x-patch 9.2 KB
v20-0004-Optimize-Logical-Replication-apply-with-multi-in.patch application/x-patch 19.7 KB
v20-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch application/x-patch 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-04-29 06:08:14 Re: Synchronizing slots from primary to standby
Previous Message Tom Lane 2024-04-29 05:32:40 Re: A failure in prepared_xacts test