RE: Parallel INSERT (INTO ... SELECT ...)

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "houzj(dot)fnst(at)cn(dot)fujitsu(dot)com" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 05:43:36
Message-ID: TYAPR01MB2990FF1F4707D3DCD3110BD6FE8D9@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
> Sorry, Maybe the tablename is a little confused,
> but 'testscan_index' is actually a table's name.
...
> Did I miss something ?

No, I don't think so. I just wanted to know the facts correctly. Your EXPLAIN output shows that the target table is testscan as follows. How does testscan_index relate to testscan?

-> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5)

> It did have performance gain, but I think it's not huge enough to ignore the
> extra's index cost.
> What do you think ?

Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not compensate for the loss of the parallel insert operation.

The loss is probably due to 1) more index page splits, 2) more buffer writes (table and index), and 3) internal locks for things such as relation extension and page content protection. To investigate 3), we should want something like [1], which tells us the wait event statistics (wait count and time for each wait event) per session or across the instance like Oracke, MySQL and EDB provides. I want to continue this in the near future.

[1]
Add accumulated statistics for wait event
https://commitfest.postgresql.org/28/2332/

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-10 06:06:45 Re: Online checksums patch - once again
Previous Message Kyotaro Horiguchi 2021-02-10 05:42:04 Re: [patch] bit XOR aggregate functions