From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-05-26 11:58:10 |
Message-ID: | CALj2ACXhYC+Nt6UdGR_2fD3P3twm1oFS=HZoRePANFV-QjZGqg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 21, 2021 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?
I think I'm wrong to just say the problem is with the flushing of
empty pages when bulk extending the relation. I should have said the
problem is with the "relation extension lock", but I will hold on to
it for a moment until I capture the relation extension lock wait
events for the regression causing cases. I will share the information
soon.
> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.
I will also capture the data file sync events along with relation
extension lock wait events.
> The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?
I tried to explain it at [1]. Please have a look. It looks like the
burden is more on the "relation extension lock" and the way the extra
new blocks are getting added.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-05-26 12:21:10 | Re: Parallel Inserts in CREATE TABLE AS |
Previous Message | Amit Kapila | 2021-05-26 11:58:03 | Re: Assertion failure while streaming toasted data |