Re: Multi Inserts in CREATE TABLE AS - revived patch

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Luc Vlaming <luc(at)swarm64(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Paul Guo <guopa(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Multi Inserts in CREATE TABLE AS - revived patch
Date: 2020-12-03 08:57:22
Message-ID: CALj2ACXr5d48+f=4P2Hrdox8sq+Jhz1_dU_TS-2V_4A3zb+zxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 3, 2020 at 1:38 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Nov 30, 2020 at 10:49 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Currently, required logic for multi inserts (such as buffer slots allocation, flushing, tuple size calculation to decide when to flush, cleanup and so on) is being handled outside of the existing tableam APIs. And there are a good number of cases where multi inserts can be used, such as for existing COPY or for CTAS, CREATE/REFRESH MATERIALIZED VIEW [proposed in this thread], and INSERT INTO SELECTs [here] which are currently under discussion. Handling the same multi inserts logic in many places is error prone and duplicates most of the code. To avoid this, proposing here are generic tableam APIs, that can be used in all the cases and which also gives the flexibility to tableam developers in implementing multi inserts logic dependent on the underlying storage engine[1].
> >
> > I would like to seek thoughts/opinions on the proposed new APIs. Once reviewed, I will start implementing them.
>
> IMHO, if we think that something really specific to the tableam then
> it makes sense to move it there. But just to avoid duplicating the
> code it might not be the best idea. Instead, you can write some
> common functions and we can call them from different places. So if
> something is very much common and will not vary based on the storage
> type we can keep it outside the tableam interface however we can move
> them into some common functions to avoid duplication.
>

Thanks for the response. Main design goal of the new APIs is to give
flexibility to tableam developers in implementing multi insert logic
dependent on the underlying storage engine. Currently, for all the
underlying storage engines, we follow the same multi insert logic such
as when and how to flush the buffered tuples, tuple size calculation,
and this logic doesn't take into account the underlying storage engine
capabilities. Please have a look at [1] where this point was brought
up by @Luc Vlaming. The subsequent discussion went on to some level of
agreement on the proposed APIs.

I want to clarify that avoiding duplicate multi insert code (for COPY,
CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs) is a byproduct(not a
main design goal) if we implement the new APIs for heap AM. I feel
sorry for projecting the goal as avoiding duplicate code earlier.

I also want to mention that @Andres Freund visualized similar kinds of
APIs in [2].

I tried to keep the API as generic as possible, please have a look at
the new structure and APIs [3].

Thoughts?

[1] - https://www.postgresql.org/message-id/ca3dd08f-4ce0-01df-ba30-e9981bb0d54e%40swarm64.com
[2] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de
[3] - https://www.postgresql.org/message-id/CALj2ACV8_O651C2zUqrVSRFDJkp8%3DTMwSdG9%2BmDGL%2BvF6CD%2BAQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-12-03 09:24:03 Re: Improper use about DatumGetInt32
Previous Message Julien Rouhaud 2020-12-03 08:53:59 Re: pg_stat_statements oddity with track = all