Quick Links

Re: Parallel Inserts in CREATE TABLE AS

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject:	Re: Parallel Inserts in CREATE TABLE AS
Date:	2021-05-26 11:58:10
Message-ID:	CALj2ACXhYC+Nt6UdGR_2fD3P3twm1oFS=HZoRePANFV-QjZGqg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?

I think I'm wrong to just say the problem is with the flushing of
empty pages when bulk extending the relation. I should have said the
problem is with the "relation extension lock", but I will hold on to
it for a moment until I capture the relation extension lock wait
events for the regression causing cases. I will share the information
soon.

> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.

I will also capture the data file sync events along with relation
extension lock wait events.

> The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?

I tried to explain it at [1]. Please have a look. It looks like the
burden is more on the "relation extension lock" and the way the extra
new blocks are getting added.

[1] https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Parallel Inserts in CREATE TABLE AS at 2021-05-21 10:16:30 from Amit Kapila

Responses

Re: Parallel Inserts in CREATE TABLE AS at 2021-05-26 12:21:10 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2021-05-26 12:21:10	Re: Parallel Inserts in CREATE TABLE AS
Previous Message	Amit Kapila	2021-05-26 11:58:03	Re: Assertion failure while streaming toasted data