Re: COPY (query) TO ... doesn't allow parallelism

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY (query) TO ... doesn't allow parallelism
Date: 2017-06-03 12:10:08
Message-ID: CAA4eK1+8VA32nNdokuAYv2=8ei_NUhpZ0WyV_N_sAyjAkAexAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 1, 2017 at 10:16 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-06-01 21:37:56 +0530, Amit Kapila wrote:
>> On Thu, Jun 1, 2017 at 9:34 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On 2017-06-01 21:23:04 +0530, Amit Kapila wrote:
>> >> On a related note, I think it might be better to have an
>> >> IsInParallelMode() check in this case as we have at other places.
>> >> This is to ensure that if this command is invoked via plpgsql function
>> >> and that function runs is the parallel mode, it will act as a
>> >> safeguard.
>> >
>> > Hm? Which other places do it that way? Isn't standard_planner()
>> > centralizing such a check?
>> >
>>
>> heap_insert->heap_prepare_insert, heap_update, heap_delete, etc.
>
> Those aren't comparable, they're not invoking the planner - and all the
> places that set PARALLEL_OK don't check for it. The relevant check for
> planning is in standard_planner().
>

The standard_planner check is sufficient to not generate parallel
plans for such statements, but it won't prevent if such commands
(which shouldn't be executed by parallel workers) are present in
functions. Consider a hypothetical case as below:

1. Create a parallel safe function containing Copy commands.
create or replace function parallel_copy(a integer) returns integer
as $$
begin
Copy (select * from t1 where c1 < 2) to 'e:\\f1';
return a;
end;
$$ language plpgsql Parallel Safe;

2. Now use this in some command which can be executed in parallel.
explain analyze select * from t1 where c1 < parallel_copy(10);

This can allow Copy command to be executed by parallel workers if we
don't have sufficient safeguards. We already tried to prohibit it in
plpgsql like in function _SPI_execute_plan(), we call
PreventCommandIfParallelMode. However, inspite of that, we have
safeguards in lower level calls, so that if the code flow reaches such
commands in parallel mode, we error out. We have a similar check in
Copy From code flow ( PreventCommandIfParallelMode("COPY FROM");) as
well, but I think we should have it in Copy To flow as well.

I agree that at first place user shouldn't mark such functions as
parallel safe, but having such safeguards can prevent us from problems
where users have incorrectly marked some functions as parallel safe.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-06-03 12:15:36 Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)
Previous Message Ashutosh Bapat 2017-06-03 11:43:18 Re: Adding support for Default partition in partitioning