Re: New Table Access Methods for Multi and Single Inserts

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
Date: 2020-12-17 11:05:33
Message-ID: CALj2ACVgT1iocd5nQ+rEmqt3xcCONkR037qbc8PiojdR39Ag=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot for taking a look at the patches.

On Thu, Dec 17, 2020 at 10:35 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> Typos:
>
> + * 1) Specify is_multi as true, then multi insert state is allcoated.
> => allocated
> + * dropped, short-lived memory context is delted and mistate is freed up.
> => deleted
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
> => minimal
> + /* Mulit insert state if requested, otherwise NULL. */
> => multi
> + * Buffer the input slots and insert the tuples from the buffered slots at a
> => *one* at a time ?
> + * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
> => I guess you mean max_size
>
> This variable could use a better name:
> +CopyMulitInsertFlushBuffers(List **mirri, ..
> mirri is fine for a local variable like an element of a struture/array, or a
> loop variable, but not for a function parameter which is an "List" of arbitrary
> pointers.
>
> I think this comment needs to be updated (again) for the removal of the Info
> structure.
> - * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
> + * multi insert buffer items stored in CopyMultiInsertInfo's
>
> There's some superfluous whitespace (and other) changes there which make the
> patch unnecessarily long.

I will correct them and post the next version of the patch set. Before
that, I would like to have the discussion and thoughts on the APIs and
their usefulness.

> I think the COPY patch should be 0002 (or maybe merged into 0001).

I can make it as a 0002 patch.

> You made the v2 insert interface a requirement for all table AMs.
> Should it be optional, and fall back to simple inserts if not implemented ?

I tried to implement the APIs mentioned by Andreas here in [1]. I just
used v2 table am APIs in existing table_insert places to show that it
works. Having said that, if you notice, I moved the bulk insert
allocation and deallocation to the new APIs table_insert_begin() and
table_insert_end() respectively, which make them tableam specific.
Currently, the bulk insert state is outside and independent of
tableam. I think we should not make bulk insert state allocation and
deallocation tableam specific. Thoughts?

[1] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

> For CTAS, I think we need to consider Paul's idea here.
> https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com

IMO, if we were to allow those raw insert APIs to perform parallel
inserts, then we would be reimplementing the existing table_insert or
table_mulit_insert API by having some sort of shared memory for
coordinating among workers and so on, may be in some other way. Yes,
we could avoid all the existing locking and shared buffers with those
raw insert APIs, I also feel that we can now do that with the existing
insert APIs for unlogged tables and bulk insert state. To me, the raw
insert APIs after implementing them for the parallel inserts, they
would look like the existing insert APIs for unlogged tables and with
bulk insert state. Thoughts?

Please have a look at [1] for detailed comment.

[1] https://www.postgresql.org/message-id/CALj2ACX0u%3DQvB7GHLEqeVYwvs2eQS7%3D-cEuem7ZaF%3Dp%2BqZ0ikA%40mail.gmail.com

> Conceivably, tableam should support something like that for arbitrary AMs
> ("insert into a new table for which we have exclusive lock"). I think that AM
> method should also be optional. It should be possible to implement a minimal
> AM without implementing every available optimization, which may not apply to
> all AMs, anyway.

I could not understand this point well. Maybe more thoughts help me here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-17 11:13:18 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Fujii Masao 2020-12-17 09:45:00 Re: STANDBY_LOCK_TIMEOUT may not interrupt ProcWaitForSignal()?