Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-09 12:56:04
Message-ID: CAA4eK1KHJnGw4FvvxsmL7v2x-4TM7HTNgCn_4ZMG4B3yaHwbKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 9, 2021 at 5:49 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>
> > > > postgres=# explain verbose insert into testscan select a from x
> > > > where
> > > > a<80000 or (a%2=0 and a>199900000);
> > > > QUERY PLAN
> > > >
> > --------------------------------------------------------------------
> > > > --
> > > > -----------------------------
> > > > Gather (cost=4346.89..1281204.64 rows=81372 width=0)
> > > > Workers Planned: 4
> > > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0
> > > > width=0)
> > > > -> Parallel Bitmap Heap Scan on public.x1
> > > > (cost=3346.89..1272067.44 rows=20343 width=8)
> > > > Output: x1.a, NULL::integer
> > > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000))
> > > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND
> > > > (x1.a >
> > > > 199900000)))
> > > > -> BitmapOr (cost=3346.89..3346.89 rows=178808
> > > > width=0)
> > > > -> Bitmap Index Scan on x1_a_idx
> > > > (cost=0.00..1495.19 rows=80883 width=0)
> > > > Index Cond: (x1.a < 80000)
> > > > -> Bitmap Index Scan on x1_a_idx
> > > > (cost=0.00..1811.01 rows=97925 width=0)
> > > > Index Cond: (x1.a > 199900000)
> > > >
> > > > PSA is my postgresql.conf file, maybe you can have a look. Besides,
> > > > I didn't do any parameters tuning in my test session.
> > >
> > > I reproduced this on my machine.
> > >
> > > I think we'd better do "analyze" before insert which helps reproduce this
> > easier.
> > > Like:
> > >
> > > -----
> > > analyze;
> > > explain analyze verbose insert into testscan select a from x where
> > > a<80000 or (a%2=0 and a>199900000);
> > > -----
> >
> > OK then.
> > Can you check if just the underlying SELECTs are run (without INSERT), is
> > there any performance degradation when compared to a non-parallel scan?
>
> It seems there is no performance degradation without insert.
>
> Till now, what I found is that:
> With tang's conf, when doing parallel insert, the walrecord is more than serial insert
> (IMO, this is the main reason why it has performance degradation)
> See the attatchment for the plan info.
>
> I have tried alter the target table to unlogged and
> then the performance degradation will not happen any more.
>
> And the additional walrecord seems related to the index on the target table.
>

I think you might want to see which exact WAL records are extra by
using pg_waldump?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-02-09 13:40:02 Re: Perform COPY FROM encoding conversions in larger chunks
Previous Message Dilip Kumar 2021-02-09 12:44:58 Re: [HACKERS] Custom compression methods