Quick Links

RE: Parallel INSERT (INTO ... SELECT ...)

From:	"tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To:	Greg Nancarrow <gregn4422(at)gmail(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Subject:	RE: Parallel INSERT (INTO ... SELECT ...)
Date:	2021-02-04 00:56:49
Message-ID:	TYAPR01MB2990DA4BCAEB495F999E0C36FEB39@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > Execute EXPLAIN on Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
> > Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> > Output: test_data1.a, test_data1.b
> > Buffers: shared hit=5
> > Planning:
> > Buffers: shared hit=27011
> > Planning Time: 24.526 ms
> > Execution Time: 44.981 ms
> >
> > Execute EXPLAIN on non-Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> > Insert on public.test_part (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
> > Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> > -> Seq Scan on public.test_data1 (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> > Output: test_data1.a, test_data1.b
> > Buffers: shared hit=5
> > Planning:
> > Buffers: shared hit=72
> > Planning Time: 0.135 ms
> > Execution Time: 79.058 ms
> >
>
> So, the results indicate that after the patch we touch more buffers
> during planning which I think is because of accessing the partition
> information, and during execution, the patch touches fewer buffers for
> the same reason. But why this can reduce the time with patch? I think
> this needs some investigation.

I guess another factor other than shared buffers is relcache and catcache. The patched version loads those cached entries for all partitions of the insert target table during the parallel-safety check in planning, while the unpatched version has to gradually build those cache entries during execution. How can wee confirm its effect?

Regards
Takayuki Tsunakawa

In response to

Re: Parallel INSERT (INTO ... SELECT ...) at 2021-01-18 09:23:53 from Amit Kapila

Responses

Re: Parallel INSERT (INTO ... SELECT ...) at 2021-02-04 10:46:18 from Amit Kapila
Re: Parallel INSERT (INTO ... SELECT ...) at 2021-02-04 12:05:17 from Greg Nancarrow

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2021-02-04 00:59:03	Re: WIP: BRIN multi-range indexes
Previous Message	Zhihong Yu	2021-02-04 00:49:34	Re: WIP: BRIN multi-range indexes