RE: Parallel INSERT (INTO ... SELECT ...)

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-03-11 06:52:48
Message-ID: TYAPR01MB29901B945C65ABBCD7FDA474FE909@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> Now, coming back to Hou-San's patch to introduce a GUC and reloption
> for this feature, I think both of those make sense to me because when
> the feature is enabled via GUC, one might want to disable it for
> partitioned tables? Do we agree on that part or someone thinks
> otherwise?
>
> The other points to bikeshed could be:
> 1. The name of GUC and reloption. The two proposals at hand are
> enable_parallel_dml and enable_parallel_insert. I would prefer the
> second (enable_parallel_insert) because updates/deletes might not have
> a similar overhead.
>
> 2. Should we keep the default value of GUC to on or off? It is
> currently off. I am fine keeping it off for this release and we can
> always turn it on in the later releases if required. Having said that,
> I see the value in keeping it on because in many cases Insert ...
> Select will be used for large data and there we will see a benefit of
> parallelism and users facing trouble (who have a very large number of
> partitions with less data to query) can still disable the parallel
> insert for that particular table. Also, the other benefit of keeping
> it on till at least the beta period is that this functionality will
> get tested and if we found reports of regression then we can turn it
> off for this release as well.
>
> Thoughts?

TBH, I'm a bit unsure, but the names with _insert would be OK.

The reason why Oracle has ENABLE PARALLEL DML clause and the parallel DML is disabled by default is probably due to the following critical restriction (and they have other less severe restrictions in parallel execution.) Our implementation does not have things like this.

"Serial or parallel statements that attempt to access a table that has been modified in parallel within the same transaction are rejected with an error message."

"A transaction can contain multiple parallel DML statements that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement (DML or query) can access the same table again in that transaction."

OTOH, I'm a bit concerned about whether we would have a common reason to disable parallel INSERT, UPDATE and DELETE. Oracle states space usage difference as follows. I wonder if something similar would apply to our parallel UPDATE/DELETE, particularly UPDATE. At least, we already know parallel INSERT results in larger tables and indexes compared to serial execution.

"Parallel UPDATE uses the existing free space in the object, while direct-path INSERT gets new extents for the data."

"Space usage characteristics may be different in parallel than serial execution because multiple concurrent child transactions modify the object."

Even if that's the case, we can add _update parameters, although we feel sorry to cause users trouble to have to be aware of multiple parameters.

Or, when it has turned out that we need _update and/or _delete parameters, we can opt to rename _insert to _dml, and keep the _insert parameter as an old, hidden synonym.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-03-11 06:55:11 Re: Freeze the inserted tuples during CTAS?
Previous Message Michael Paquier 2021-03-11 06:34:24 Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)