Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-05-27 07:28:48
Message-ID: CALj2ACU22evtV7SLxL27oo3VTQepe7jamKsJeVt77d8YxMy4sQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 27, 2021 at 12:46 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> Basically you are creating a new table and loading data to it and that means you will be less likely to access those data soon so for such thing spoiling buffer cache may not be a good idea.
> --------------------------------------------------
>
> Some people, including me, would say that the table will be accessed soon and that's why the data is loaded quickly during minimal maintenance hours.
>
>
> --------------------------------------------------
> I was just suggesting only for experiments for identifying the root cause.
> --------------------------------------------------
>
> I thought this is a good chance to possibly change things better (^^).
> I guess the user would simply think like this: "I just want to finish CTAS as quickly as possible, so I configured to take advantage of parallelism. I want CTAS to make most use of our resources. Why doesn't Postgres try to limit resource usage (by using the ring buffer) against my will?"

If the idea is to give the user control of whether or not to use the
separate RING BUFFER for bulk inserts/writes, then how about giving it
as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
and COPY. Furthermore, we could make the rel option an integer and
allow users to provide the size of the ring buffer they want to choose
for a particular bulk insert operation (of course with a max limit
which is not exceeding the shared buffers or some reasonable amount
not exceeding the RAM of the system).

I think we can discuss this in a separate thread and see what other
hackers think.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-05-27 07:33:22 RE: Parallel Inserts in CREATE TABLE AS
Previous Message Dilip Kumar 2021-05-27 07:17:30 Re: Race condition in recovery?