Re: [bug?] Missed parallel safety checks, and wrong parallel safety

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-05-06 10:46:56
Message-ID: CA+TgmoZ_iJWWV0c+o7Hp-CByjfs13zQO_kxwNp13qEidS=PC2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 5, 2021 at 10:54 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> (1) Is it better to get hardcoded function properties out of fmgr_builtins[]?
> It's little worth doing so or thinking about that. It's no business for users to change system objects, in this case system functions.

I don't entirely agree with this. Whether or not users have any
business changing system functions, it's better to have one source of
truth than two. Now that being said, this is not a super-important
problem for us to go solve, and hard-coding a certain amount of stuff
is probably necessary to allow the system to bootstrap itself. So for
me it's one of those things that is in.a grey area: if someone showed
up with a patch to make it better, I'd be happy. But I probably
wouldn't spend much time on writing such a patch unless it solved some
other problem that I cared about.

> (3) Why do we want to have parallel-safety in fmgr_builtins[]?
> As proposed in this thread and/or "Parallel INSERT SELECT take 2", we thought of detecting parallel unsafe function execution during SQL statement execution, instead of imposing much overhead to check parallel safety during query planning. Specifically, we add parallel safety check in fmgr_info() and/or FunctionCallInvoke().

I haven't read that thread, but I don't understand how that can work.
The reason we need to detect it at plan time is because we might need
to use a different plan. At execution time it's too late for that.

Also, it seems potentially quite expensive. A query may be planned
once and executed many times. Also, a single query execution may call
the same SQL function many times. I think we don't want to incur the
overhead of an extra syscache lookup every time anyone calls any
function. A very simple expression like a+b+c+d+e involves four
function calls, and + need not be a built-in, if the data type is
user-defined. And that might be happening for every row in a table
with millions of rows.

> (Alternatively, I think we can conclude that we assume parallel unsafe built-in functions won't be used in parallel DML. In that case, we don't change FmgrBuiltin and we just skip the parallel safety check for built-in functions when the function is called. Would you buy this?)

I don't really understand this idea. There's no such thing as parallel
DML, is there? There's just DML, which we must to decide whether can
be done in parallel or not based on, among other things, the
parallel-safety markings of the functions it contains. Maybe I am not
understanding you correctly, but it seems like you're suggesting that
in some cases we can just assume that the user hasn't done something
parallel-unsafe without making any attempt to check it. I don't think
I could support that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-05-06 11:05:46 Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Previous Message Masahiko Sawada 2021-05-06 10:42:14 Re: decoupling table and index vacuum