RE: Determine parallel-safety of partition relations for Inserts

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: Determine parallel-safety of partition relations for Inserts
Date: 2021-01-18 04:57:06
Message-ID: OSBPR01MB29826CFA98FD1AD5A361B2E3FEA40@OSBPR01MB2982.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> We already allow users to specify the degree of parallelism for all
> the parallel operations via guc's max_parallel_maintenance_workers,
> max_parallel_workers_per_gather, then we have a reloption
> parallel_workers and vacuum command has the parallel option where
> users can specify the number of workers that can be used for
> parallelism. The parallelism considers these as hints but decides
> parallelism based on some other parameters like if there are that many
> workers available, etc. Why the users would expect differently for
> parallel DML?

I agree that the user would want to specify the degree of parallelism of DML, too. My simple (probably silly) question was, in INSERT SELECT,

* If the target table has 10 partitions and the source table has 100 partitions, how would the user want to specify parameters?

* If the source and target tables have the same number of partitions, and the user specified different values to parallel_workers and parallel_dml_workers, how many parallel workers would run?

* What would the query plan be like? Something like below? Can we easily support this sort of nested thing?

Gather
Workers Planned: <parallel_dml_workers>
Insert
Gather
Workers Planned: <parallel_workers>
Parallel Seq Scan

> Which memory specific to partitions are you referring to here and does
> that apply to the patch being discussed?

Relation cache and catalog cache, which are not specific to partitions. This patch's current parallel safety check opens and closes all descendant partitions of the target table. That leaves those cache entries in CacheMemoryContext after the SQL statement ends. But as I said, we can consider it's not a serious problem in this case because the parallel DML would be executed in limited number of concurrent sessions. I just touched on the memory consumption issue for completeness in comparison with (3).

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2021-01-18 04:57:11 Re: some pointless HeapTupleHeaderIndicatesMovedPartitions calls
Previous Message Pavan Deolasee 2021-01-18 04:53:43 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits