Re: parallel append vs. simple UNION ALL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: parallel append vs. simple UNION ALL
Date: 2018-03-05 18:57:15
Message-ID: CA+TgmoZb9zMtm7a2KYJVhiZ8Af-T+26oSz2PFOrH2wigCD=80Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 27, 2018 at 6:21 AM, Rajkumar Raghuwanshi
<rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
> I have applied 0001 on pg-head, and while playing with regression tests, it
> crashed with below test case.
>
> postgres=# SET min_parallel_table_scan_size=0;
> SET
> postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
> ORDER BY 1, 2, 3;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hmm, nice catch. I think part of the problem here is that commit
69f4b9c85f168ae006929eec44fc44d569e846b9, wherein Andres introduced
the ProjectSet node, didn't really do the right thing in terms of
testing parallel-safety. Before that commit, is_parallel_safe(root,
(Node *) scanjoin_target->exprs)) was really testing whether the tlist
produced during the scan/join phase was parallel-safe. However, after
that commit, scanjoin_target->exprs wasn't really the final target for
the scan/join phase any more; instead, it was the first of possibly
several targets computed by split_pathtarget_at_srfs(). Really, the
right thing to do is to test the *last* element in that list for
parallel-safety, but as the code stands we end up testing the *first*
element. So, if there's a parallel-restricted item in the target list
(this query ends up putting a CoerceToDomain in the target list, which
we currently consider parallel-restricted), it looks we can
nevertheless end up trying to project it in what is supposed to be a
partial path.

There are a large number of cases where this doesn't end up mattering
because the next upper_rel created will not get marked
consider_parallel because its target list will also contain the same
parallel-restricted construct, and therefore the partial paths
generated for the scan/join rel will never get used -- except to
generate Gather/Gather Merge paths, which has already happened; but
that step didn't know about the rest of the scan/join targets either,
so it won't have used them. However, both create_distinct_paths() and
the code in grouping_planner that populates final_rel assume that they
don't need to retest the target for parallel-safety because no
projection is done at those levels; they just inherit the
parallel-safety marking of the input rel, so in those cases if the
input rel's marking is wrong the result is populated upward.

There's another way final_rel->consider_parallel can be wrong, too: if
the FROM-list is empty, then we create a join rel and set its
consider_parallel flag without regard to the parallel-safety of the
target list. There are comments in query_planner() says that this
will be dealt with "later", but this seems not to be true. (Before
Tom's commit da1c91631e3577ea5818f855ebb5bd206d559006, the comments
simply ignored the question of whether a check was needed here, but
Tom seems to have inserted an incorrect justification for the
already-wrong code.)

I'm not sure to what degree, if at all, any of these problems are
visible given that we don't use final_rel->consider_parallel for much
of anything. Certainly, it gets much easier to trigger a problem with
0001 applied, as the test case shows. But I'm not entirely convinced
that there's no problem even without that. It seems like every upper
rel that is setting its consider_parallel flag based on the first
element of some list of targets rather than the last is potentially
vulnerable to ending up with the wrong answer, and I'm afraid that
might have some adverse consequence that I haven't quite pinned down
yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-05 19:19:12 Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
Previous Message Alexander Kuzmenkov 2018-03-05 18:46:04 Re: IndexJoin memory problem using spgist and boxes