RE: Parallel INSERT (INTO ... SELECT ...)

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-04 00:56:49
Message-ID: TYAPR01MB2990DA4BCAEB495F999E0C36FEB39@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > Execute EXPLAIN on Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
> > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> > Output: test_data1.a, test_data1.b
> > Buffers: shared hit=5
> > Planning:
> > Buffers: shared hit=27011
> > Planning Time: 24.526 ms
> > Execution Time: 44.981 ms
> >
> > Execute EXPLAIN on non-Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
> > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> > Output: test_data1.a, test_data1.b
> > Buffers: shared hit=5
> > Planning:
> > Buffers: shared hit=72
> > Planning Time: 0.135 ms
> > Execution Time: 79.058 ms
> >
>
> So, the results indicate that after the patch we touch more buffers
> during planning which I think is because of accessing the partition
> information, and during execution, the patch touches fewer buffers for
> the same reason. But why this can reduce the time with patch? I think
> this needs some investigation.

I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entries for all partitions of the insert target table during the parallel-safety check in planning, while the unpatched version has to gradually build those cache entries during execution. How can wee confirm its effect?

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-02-04 00:59:03 Re: WIP: BRIN multi-range indexes
Previous Message Zhihong Yu 2021-02-04 00:49:34 Re: WIP: BRIN multi-range indexes