Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)
Date: 2016-05-07 13:07:17
Message-ID: CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andreas Seltenreich <seltenreich(at)gmx(dot)de> writes:
> > when fuzz testing master as of c1543a8, parallel workers trigger the
> > following assertion in ExecInitSubPlan every couple hours.
> > TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File:
"list.c", Line: 390)
> > Sample backtraces of a worker and leader below, plan of leader attached.
> > The collected queries don't seem to reproduce it.
>
> Odd. My understanding of the restrictions on parallel query is that
> anything involving a SubPlan ought not be parallelized;
>

Subplan references are considered parallel-restricted, so parallel plan can
be generated if there are subplans in a query, but they shouldn't be pushed
to workers. I have tried a somewhat simpler example to see if we pushdown
anything parallel restricted to worker in case of joins and it turned out
there are cases when that can happen. Consider below example:

create or replace function parallel_func_select() returns integer
as $$
declare
ret_val int;
begin
ret_val := 1000;
return ret_val;
end;
$$ language plpgsql Parallel Restricted;

CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 10000000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;

Explain Verbose SELECT t1.c1 + parallel_func_select(), t2.c1 FROM t1 JOIN
t2 ON t1.c1 = t2.c1;

QUERY PLAN

--------------------------------------------------------------------------------
--------
Gather (cost=32813.00..537284.53 rows=1000000 width=8)
Output: ((t1.c1 + parallel_func_select())), t2.c1
Workers Planned: 2
-> Hash Join (cost=31813.00..436284.53 rows=1000000 width=8)
Output: (t1.c1 + parallel_func_select()), t2.c1
Hash Cond: (t1.c1 = t2.c1)
-> Parallel Seq Scan on public.t1 (cost=0.00..95721.08
rows=4166608 w
idth=4)
Output: t1.c1, t1.c2
-> Hash (cost=15406.00..15406.00 rows=1000000 width=4)
Output: t2.c1
-> Seq Scan on public.t2 (cost=0.00..15406.00 rows=1000000
widt
h=4)
Output: t2.c1
(12 rows)

From the above output it is clear that parallel restricted function is
pushed down below gather node. I found that though we have have care fully
avoided to push pathtarget below GatherPath in apply_projection_to_path()
if pathtarget contains any parallel unsafe or parallel restricted clause,
but we are separately also trying to apply pathtarget to partialpath list
which doesn't seem to be the correct way even if it is required. I think
this has been added during parallel aggregate patch and it seems to me this
is not required after the changes related to GatherPath in
apply_projection_to_path().

After applying the attached patch, it avoids to add parallel restricted
clauses below gather path.

Now back to the original bug, if you notice in plan file attached in
original bug report, the subplan is pushed below Gather node in target
list, but not to immediate join, rather at one more level down to SeqScan
path. I am still not sure how it has manage to push the restricted clauses
to that down the level.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
avoid_restricted_clause_below_gather_v1.patch application/octet-stream 885 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-05-07 13:36:06 Re: [COMMITTERS] pgsql: Add TAP tests for pg_dump
Previous Message Simon Riggs 2016-05-07 08:45:10 Re: pg9.6 segfault using simple query (related to use fk for join estimates)