RE: [bug?] Missed parallel safety checks, and wrong parallel safety

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [bug?] Missed parallel safety checks, and wrong parallel safety
Date: 2021-06-09 06:20:43
Message-ID: OS0PR01MB571675EBE1882F24554A72D494369@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmhaas(at)gmail(dot)com>
> On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > Note the error is raised after applying the patch, without the patch,
> > the above won't show any error (error message could be improved here).
> > Such cases can lead to unpredictable behavior without a patch because
> > we won't be able to detect the execution of parallel-unsafe functions.
> > There are similar examples from regression tests. Now, one way to deal
> > with similar cases could be that we document them and say we don't
> > consider parallel-safety in such cases and the other way is to detect
> > such cases and error out. Yet another way could be that we somehow try
> > to check these cases as well before enabling parallelism but I thought
> > these cases fall in the similar category as aggregate's support
> > functions.
>
> I'm not very excited about the idea of checking type input and type output
> functions. It's hard to imagine someone wanting to do something
> parallel-unsafe in such a function, unless they're just trying to prove a point. So
> I don't think checking it would be a good investment of CPU cycles. If we do
> anything at all, I'd vote for just documenting that such functions should be
> parallel-safe and that their parallel-safety marks are not checked when they are
> used as type input/output functions. Perhaps we ought to document the same
> thing with regard to opclass support functions, another place where it's hard to
> imagine a realistic use case for doing something parallel-unsafe.
>
> In the case of aggregates, I see the issues slightly differently. I don't know that
> it's super-likely that someone would want to create a parallel-unsafe
> aggregate function, but I think there should be a way to do it, just in case.
> However, if somebody wants that, they can just mark the aggregate itself
> unsafe. There's no benefit for the user to marking the aggregate safe and the
> support functions unsafe and hoping that the system figures it out somehow.
>
> In my opinion, you're basically taking too pure a view of this. We're not trying to
> create a system that does such a good job checking parallel safety markings
> that nobody can possibly find a thing that isn't checked no matter how hard
> they poke around the dark corners of the system. Or at least we shouldn't be
> trying to do that. We should be trying to create a system that works well in
> practice, and gives people the flexibility to easily avoid parallelism when they
> have a query that is parallel-unsafe, while still getting the benefit of parallelism
> the rest of the time.
>
> I don't know what all the cases you've uncovered are, and maybe there's
> something in there that I'd be more excited about changing if I knew what it
> was, but the particular problems you're mentioning here seem more
> theoretical than real to me.

I think another case that parallel unsafe function could be invoked in parallel mode is
the TEXT SEARCH TEMPLATE's init_function or lexize_function. Because currently,
the planner does not check the safety of these function. Please see the example below[1]

I am not sure will user use parallel unsafe function in init_function or lexize_function,
but if user does, it could cause unexpected result.

Does it make sense to add some check for init_function or lexize_function
or document this together with type input/output and opclass support functions ?

[1]----------------------------EXAMPLE------------------------------------
CREATE FUNCTION dsnowball_init(INTERNAL)
RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_init'
LANGUAGE C STRICT;

CREATE FUNCTION dsnowball_lexize(INTERNAL, INTERNAL, INTERNAL, INTERNAL)
RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_lexize'
LANGUAGE C STRICT;

CREATE TEXT SEARCH TEMPLATE snowball
(INIT = dsnowball_init,
LEXIZE = dsnowball_lexize);

COMMENT ON TEXT SEARCH TEMPLATE snowball IS 'snowball stemmer';

create table pendtest (ts tsvector);
create index pendtest_idx on pendtest using gin(ts);
insert into pendtest select (to_tsvector('Lore ipsum')) from generate_series(1,10000000,1);
analyze;

set enable_bitmapscan = off;

postgres=# explain select * from pendtest where to_tsquery('345&qwerty') @@ ts;
QUERY PLAN
--------------------------------------------------------------------------------
Gather (cost=1000.00..1168292.86 rows=250 width=31)
Workers Planned: 2
-> Parallel Seq Scan on pendtest (cost=0.00..1167267.86 rows=104 width=31)
Filter: (to_tsquery('345&qwerty'::text) @@ ts)

-- In the example above, dsnowball_init() and dsnowball_lexize() will be executed in parallel mode.

----------------------------EXAMPLE------------------------------------

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-09 06:28:29 Re: Fdw batch insert error out when set batch_size > 65535
Previous Message Drouvot, Bertrand 2021-06-09 06:17:28 Re: logical decoding bug: segfault in ReorderBufferToastReplace()