Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: 2024-05-16 19:00:36
Message-ID: af653e36d7648748d697d0d9979240d7449b7c6d.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2024-05-15 at 16:31 -0700, Jeff Davis wrote:
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

To expand on this point:

For heap, the insert buffer is only 1000 tuples, which doesn't take
much memory. But for an AM that does any significant reorganization of
the input data, the buffer may be much larger. For insert into a
partitioned table, that buffer could be multiplied across many
partitions, and start to be a real concern.

We might not need table AM API changes at all here beyond what v21
offers. The ModifyTableState includes the memory context, so that gives
the caller a way to know the memory consumption of a single partition's
buffer. And if it needs to free the resources, it can just call
modify_table_end(), and then _begin() again if more tuples hit that
partition.

So I believe what I'm asking for here is entirely orthogonal to the
current proposal.

However, it got me thinking that we might not want to use work_mem for
controlling the heap's buffer size. Each AM is going to have radically
different memory needs, and may have its own (extension) GUCs to
control that memory usage, so they won't honor work_mem. We could
either have a separate GUC for the heap if it makes sense, or we could
just hard-code a reasonable value.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-05-16 19:01:31 Re: race condition when writing pg_control
Previous Message Andres Freund 2024-05-16 18:58:05 Re: race condition when writing pg_control