RE: Parallel INSERT (INTO ... SELECT ...)

From: "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-01-18 07:32:24
Message-ID: 2cf243f1ad7f4b808574c78840790f9d@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > Can we test cases when we have few rows in the Select table (say
> > 1000) and there 500 or 1000 partitions. In that case, we won't
> > select parallelism but we have to pay the price of checking
> > parallel-safety of all partitions. Can you check this with 100, 200,
> > 500, 1000 partitions table?
>
> I also wanted to see such an extreme(?) case. The 1,000 rows is not
> the count per partition but the total count of all partitions.e.g.,
> when # of partitions is 100, # of rows per partition is 10.

Below results are in serial plan which select table total rows are 1,000. The Excution Time + Planning Time is still less than unpatched.
(does this patch make some optimizes in serial insert? I'm a little confused here, Because the patched execution time is less than unpatched, but I didn't find information in commit messages about it. If I missed something, please kindly let me know.)

| patched | master | %reg |
-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(all time) |
-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
100 | 5.294 | 1.581 | 6.951 | 0.037 | -24% | -2% |
200 | 9.666 | 3.068 | 13.681 | 0.043 | -29% | -7% |
500 | 22.742 | 12.061 | 35.928 | 0.125 | -37% | -3% |
1000 | 46.386 | 24.872 | 75.523 | 0.142 | -39% | -6% |

I did another test which made check overhead obvious. this case is not fitting for partition purpose, but I put it here as an extreme case too.
Select table total rows are 1,000, # of partitions is 2000. So only the first 1000 partitions have 1 row per partition, the last 1000 partitions have no data inserted.

| patched | master | %reg |
-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
partitions |Execution Time(ms)| Planning Time(ms) | Execution Time(ms) | Planning Time(ms) | %reg(Excution Time) | %reg(all time) |
-----------|------------------|--------------------|--------------------|-------------------|---------------------|-----------------|
2000 | 45.758 | 51.697 | 80.272 | 0.136 | -43 | 21% |

Regards,
Tang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-01-18 07:34:44 RE: Stronger safeguard for archive recovery not to miss data
Previous Message Tatsuro Yamada 2021-01-18 07:31:56 Re: list of extended statistics on psql