Re: BUG #15577: Query returns different results when executed multiple times

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Bartosz Polnik <bartoszpolnik(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15577: Query returns different results when executed multiple times
Date: 2019-01-09 22:08:43
Message-ID: 18883.1547071723@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> So it's not the reading of the tuple queue per se, but the resulting
> visit to a higher part of the plan when the Gather node emits a tuple.
> It can't be right that plan id 2 and plan id 0 are both using paramno
> 0, can it?

So the plan shape, shorn of uninteresting detail, is

Nested Loop Left Join
-> Gather
-> Nested Loop
-> Hash Join
-> Parallel Seq Scan on table_c tc
-> Hash
-> Seq Scan on table_d td
-> Index Scan ... on table_b tba
Index Cond: (target_id = tc.id)
-> Index Scan ... on table_a ta
Index Cond: ((table_c_id = tc.id) AND (date = '2018-08-31'::date))

The lower Nested Loop has a NestLoopParam node telling it to pass the
value of tc.id down to the tba indexscan. The upper one also has a
NestLoopParam node, telling it to pass the value of tc.id down to the ta
indexscan. Since these are for the same Var, the planner believes it's
safe to use the same PARAM_EXEC slot for both of them. And in a normal
plan tree, it'd be right. I think what is happening is

(1) Leader runs the lower nestloop for a bit and gets back a successful
tc/tba join row. At this point its PARAM_EXEC slot is filled with the
tc.id value from that tc row.

(2) It hands this up to the upper nestloop, which assigns that same
tc.id to the PARAM_EXEC slot ... no trouble so far ... and runs the
ta indexscan with that.

(3) Now we go back to the Gather, which I think must go off and look for
rows coming from child workers before it resumes the leader's nestloop.
It gets some row, with a different tc.id value, and hands that back up.

(4) Upper nestloop assigns the tc.id value from the child's row to the
PARAM_EXEC slot, and runs the ta indexscan with that.

(5) Return to the Gather, which now allows the lower nestloop to proceed
in the leader.

(6) Lower nestloop thinks it's not done with the current outer (tc) row,
so it invokes the tba indexscan to see if that can return any more
rows. But since it hasn't advanced to a new outer row, it doesn't
touch the NestLoopParam or PARAM_EXEC slot.

(7) At this point we're resuming the tba indexscan with a different
value for tc.id than it was started with. The results of that are
undefined, IMO, but what's apparently happening is that it fails to
match some rows that it otherwise would have matched. So the end
result is some rows are missing from the output.

One way we could deal with this, perhaps, is to decide that NestLoopParams
above and below a Gather can't share PARAM_EXEC slots. I'm not sure how
we'd mechanize that, other than not allowing NestLoopParams to share
PARAM_EXEC slots *ever*, which might not really be that much of a cost.

But TBH the whole thing frightens me quite a lot as to what other
serial-processing assumptions are getting broken by plopping Gather
into the middle of a plan tree. I wonder whether we'd not be best
off disallowing this sort of plan shape, and/or forbidding the leader
from also executing the parallelized subplan.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-01-09 23:09:36 Re: BUG #15577: Query returns different results when executed multiple times
Previous Message Thomas Munro 2019-01-09 21:46:25 Re: BUG #15577: Query returns different results when executed multiple times