RE: Determine parallel-safety of partition relations for Inserts

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: RE: Determine parallel-safety of partition relations for Inserts
Date: 2021-01-15 14:04:53
Message-ID: TYAPR01MB29907AE025B60A1C2CA5F08DFEA70@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> This will surely increase planning time but the execution is reduced
> to an extent due to parallelism that it won't matter for either of the
> cases if we see just total time. For example, see the latest results
> for parallel inserts posted by Haiying Tang [3]. There might be an
> impact when Selects can't be parallelized due to the small size of the
> Select-table but we still have to traverse all the partitions to
> determine parallel-safety but not sure how much it is compared to
> overall time. I guess we need to find the same but apart from that can
> anyone think of a better way to determine parallel-safety of
> partitioned relation for Inserts?

Three solutions(?) quickly come to my mind:

(1) Have the user specify whether they want to parallelize DML
Oracle [1] and SQL Server [2] take this approach. Oracle disables parallel DML execution by default. The reason is described as "This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements and parallel DML is disabled for a session by default." To enable parallel DML in a session or in a specific statement, you need to run either of the following:

ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ ENABLE_PARALLEL_DML */ …

Besides, the user has to specify a parallel hint in a DML statement, or specify the parallel attribute in CREATE or ALTER TABLE.

SQL Server requires a TABLOCK hint to be specified in the INSERT SELECT statement like this:

INSERT INTO Sales.SalesHistory WITH (TABLOCK) (target columns...) SELECT ...;

(2) Postpone the parallel safety check after the planner finds a worthwhile parallel query plan
I'm not sure if the current planner code allows this easily...

(3) Record the parallel safety in system catalog
Add a column like relparallel in pg_class that indicates the parallel safety of the relation. planner just checks the value instead of doing heavy work for every SQL statement. That column's value is modified whenever a relation alteration is made that affects the parallel safety, such as adding a domain column and CHECK constraint. In case of a partitioned relation, the parallel safety attributes of all its descendant relations are merged. For example, if a partition becomes parallel-unsafe, the ascendant partitioned tables also become parallel-unsafe.

But... developing such code would be burdonsome and bug-prone?

I'm inclined to propose (1). Parallel DML would be something that a limited people run in limited circumstances (data loading in data warehouse and batch processing in OLTP systems by the DBA or data administrator), so I think it's legitimate to require explicit specification of parallelism.

As an aside, (1) and (2) has a potential problem with memory consumption. Opening relations bloat CacheMemoryContext with relcaches and catcaches, and closing relations does not free the (all) memory. But I don't think it could really become a problem in practice, because parallel DML would be run in limited number of concurrent sessions.

[1]
Types of Parallelism
https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/types-parallelism.html#GUID-D8290A02-BE5F-436A-B814-D6FD71CEE81F

[2]
INSERT (Transact-SQL)
https://docs.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver15#best-practices

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-01-15 14:18:26 RE: Disable WAL logging to speed up data loading
Previous Message Amit Langote 2021-01-15 13:48:49 Re: POC: postgres_fdw insert batching