RE: Parallel INSERT (INTO ... SELECT ...)

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Greg Nancarrow' <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-12-10 02:23:54
Message-ID: OSBPR01MB298274BCB8959932191EA9A6FECB0@OSBPR01MB2982.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
> Firstly, in order to perform parallel-safety checks in the case of partitions, the
> patch currently recursively locks/unlocks
> (AccessShareLock) each partition during such checks (as each partition may
> itself be a partitioned table). Is there a better way of performing the
> parallel-safety checks and reducing the locking requirements?

First of all, as you demonstrated the planning time and execution time of parallel insert, I think the increased planning time is negligible when the parallel insert is intentionally used for loading large amount of data. However, it's a problem if the overhead is imposed on OLTP transactions. Does the overhead occur with the default values of max_parallel_workers_per_gather = 2 and max_parall_workers = 8?

To avoid this heavy checking during planning, I'm wondering if we can have an attribute in pg_class, something like relhasindexes and relhas triggers. The concerning point is that we have to maintain the accuracy of the value when dropping ancillary objects around the table/partition.

> Secondly, I found that when running "make check-world", the
> "partition-concurrent-attach" test fails, because it is expecting a partition
> constraint to be violated on insert, while an "alter table attach partition ..." is
> concurrently being executed in another transaction. Because of the partition
> locking done by the patch's parallel-safety checking code, the insert blocks on
> the exclusive lock held by the "alter table" in the other transaction until the
> transaction ends, so the insert ends up successfully completing (and thus fails
> the test) when the other transaction ends. To overcome this test failure, the
> patch code was updated to instead perform a conditional lock on the partition,
> and on failure (i.e. because of an exclusive lock held somewhere else), just
> assume it's parallel-unsafe because the parallel-safety can't be determined
> without blocking on the lock. This is not ideal, but I'm not sure of what other
> approach could be used and I am somewhat reluctant to change that test. If
> anybody is familiar with the "partition-concurrent-attach" test, any ideas or
> insights would be appreciated.

That test looks sane. I think what we should do is to disable parallel operation during that test. It looks like some of other existing test cases disable parallel query by setting max_parallel_workers_per_gather to 0. It's not strange that some tests fail with some configuration. autovacuum is disabled in many places of the regression test.

Rather, I don't think we should introduce the trick to use ConditionalLockAcquire(). Otherwise, the insert would be executed in a serial fashion without the user knowing it -- "What? The insert suddenly slowed down multiple times today, and it didn't finish within the planned maintenance window. What's wrong?"

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2020-12-10 02:35:39 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message Zhihong Yu 2020-12-10 02:19:05 Re: Parallel Inserts in CREATE TABLE AS