RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-05-07 06:21:59
Message-ID: TYAPR01MB29907A6BEF0A5747FEE024D0FE579@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
> On Wed, May 5, 2021 at 10:54 PM tsunakawa(dot)takay(at)fujitsu(dot)com
> <tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> > As proposed in this thread and/or "Parallel INSERT SELECT take 2", we
> thought of detecting parallel unsafe function execution during SQL statement
> execution, instead of imposing much overhead to check parallel safety during
> query planning. Specifically, we add parallel safety check in fmgr_info()
> and/or FunctionCallInvoke().
>
> I haven't read that thread, but I don't understand how that can work.
> The reason we need to detect it at plan time is because we might need
> to use a different plan. At execution time it's too late for that.

(I forgot to say this in my previous email. Robert-san, thank you very much for taking time to look at this and giving feedback. It was sad that we had to revert our parallel INSERT SELECT for redesign at the very end of the last CF. We need advice and suggestions from knowledgeable and thoughtful people like Tom-san, Andres-san and you in early stages to not repeat the tragedy.)

I'd really like you to have a look at the first mail in [1], and to get your feedback like "this part should be like ... instead" and "this part would probably work, I think." Without feedback from leading developers, I'm somewhat at a loss if and how we can proceed with the proposed approach.

To put it shortly, we found that it can take non-negligible time for the planner to check the parallel safety of the target table of INSERT SELECT when it has many (hundreds or thousands of) partitions. The check also added much complicated code, too. So, we got inclined to take Tom-san's suggestion -- let the user specify the parallel safety of the target table with CREATE/ALTER TABLE and the planner just decides a query plan based on it. Caching the results of parallel safety checks in relcache or a new shared hash table didn't seem to work well to me, or it should be beyond my brain at least.

We may think that it's okay to just believe the user-specified parallel safety. But I thought we could step up and do our best to check the parallel safety during statement execution, if it's not very invasive in terms of performance and code complexity. The aforementioned idea is that if the parallel processes find the called functions parallel unsafe, they error out. All ancillary objects of the target table, data types, constraints, indexes, triggers, etc., come down to some UDF, so it should be enough to check the parallel safety when the UDF is called.

> Also, it seems potentially quite expensive. A query may be planned
> once and executed many times. Also, a single query execution may call
> the same SQL function many times. I think we don't want to incur the
> overhead of an extra syscache lookup every time anyone calls any
> function. A very simple expression like a+b+c+d+e involves four
> function calls, and + need not be a built-in, if the data type is
> user-defined. And that might be happening for every row in a table
> with millions of rows.

We (optimistically) expect that the overhead won't be serious, because the parallel safety information is already at hand in the FmgrInfo struct when the function is called. We don't have to look up the syscache every time the function is called.

Of course, adding even a single if statement may lead to a disaster in a critical path, so we need to assess the performance. I'd also appreciate if you could suggest some good workload we should experiment in the thread above.

[1]
Parallel INSERT SELECT take 2
https://www.postgresql.org/message-id/TYAPR01MB29905A9AB82CC8BA50AB0F80FE709@TYAPR01MB2990.jpnprd01.prod.outlook.com

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2021-05-07 07:23:00 Re: Removing unneeded self joins
Previous Message Dilip Kumar 2021-05-07 06:20:23 Re: Identify missing publications from publisher while create/alter subscription.