Re: Allow parallel plan for referential integrity checks?

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>
Subject: Re: Allow parallel plan for referential integrity checks?
Date: 2023-08-16 09:18:25
Message-ID: CAC+AXB0QTc3V+019Y3QvEViuu7H9PaYgqMA39JYmR62cxovf9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2023 at 5:06 PM Juan José Santamaría Flecha <
juanjo(dot)santamaria(at)gmail(dot)com> wrote:

> On Tue, Jul 4, 2023 at 9:45 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
>> As there is no new patch submitted I will go ahead and do that, please
>> feel
>> free to resubmit when there is renewed interest in working on this.
>>
>
> Recently I restored a database from a directory format backup and having
> this feature would have been quite useful. So, I would like to resume the
> work on this patch, unless OP or someone else is already on it.
>

Hearing no one advising me otherwise I'll pick up the work on the patch.
First, I'll try to address all previous comments.

On Mon, Feb 14, 2022 at 3:34 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>
> wrote:
> > I noticed that referential integrity checks aren't currently
> > parallelized. Is it on purpose?
>
> It's not 100% clear to me that it is safe. But on the other hand, it's
> also not 100% clear to me that it is unsafe.
>
> Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
> confident that nothing bad would happen, and this wasn't one of those
> places. It's something of a nested-query environment -- your criterion
> #6. How do we know that the surrounding query isn't already parallel?
> Perhaps because it must be DML, but I think it must be more important
> to support parallel DML than to support this.
>

Currently I don't see a scenario where the validation might run inside
another parallel query, but I could be missing something. Suggestions on
some edge cases to test are more than welcome.

We already have a case where parallel operations can occur when adding a
table constraint, and that's the index creation for a primary key. Take for
example:

postgres=# SET max_parallel_maintenance_workers TO 4;
postgres=# SET parallel_setup_cost TO 0;
postgres=# SET parallel_tuple_cost TO 0;
postgres=# SET parallel_leader_participation TO 0;
postgres=# SET min_parallel_table_scan_size TO 0;
postgres=# SET client_min_messages TO debug1;
postgres=# CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled
= off);
postgres=# ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
DEBUG: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"parallel_pk_table_pkey" for table "parallel_pk_table"
DEBUG: building index "parallel_pk_table_pkey" on table
"parallel_pk_table" with request for 1 parallel workers
...
DEBUG: index "parallel_pk_table_pkey" can safely use deduplication
DEBUG: verifying table "parallel_pk_table"

This should be better documented. Maybe "parallel index build" [1] could
have its own section, so it can be cited from other sections?

> I'm not sure what the right thing to do here is, but I think it would
> be good if your patch included a test case.
>

I have included a basic test for parallel ADD PRIMARY KEY and ADD FOREIGN
KEY.

On Sat, Mar 19, 2022 at 1:57 AM Imseih (AWS), Sami <simseih(at)amazon(dot)com>
wrote:

> It would be valuable to add logging to ensure that the ActiveSnapshot and
> TransactionSnapshot
> is the same for the leader and the workers. This logging could be tested
> in the TAP test.
>

That machinery is used in any parallel query, I'm not sure this patch
should be responsible for carrying that requirement.

Also, inside RI_Initial_Check you may want to set max_parallel_workers to
> max_parallel_maintenance_workers.
>
> Currently the work_mem is set to maintenance_work_mem. This will also
> require
> a doc change to call out.
>

Done.

PFA the patch.

[1] https://www.postgresql.org/docs/current/sql-createindex.html

Regards,

Juan José Santamaría Flecha

Attachment Content-Type Size
v2-0001-Allow-parallel-plan-for-referential-integrity-checks.patch application/octet-stream 5.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2023-08-16 09:22:31 Re: postgres_fdw: wrong results with self join + enable_nestloop off
Previous Message Peter Smith 2023-08-16 09:14:18 Add 'worker_type' to pg_stat_subscription