Re: BUG #15577: Query returns different results when executed multiple times

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Bartosz Polnik <bartoszpolnik(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15577: Query returns different results when executed multiple times
Date: 2019-01-09 23:27:44
Message-ID: CAEepm=2Re1V5EeeeFRZXvEvAAe9F+GmWSQ3Sf6-+dfBo_of1Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jan 10, 2019 at 12:09 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > On Thu, Jan 10, 2019 at 10:04 AM Andrew Gierth
> > <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> >> But clearly this can't work if one param is referenced both inside and
> >> outside a Gather, because while they will compare equal for Vars, they
> >> won't actually have the same value thanks to rows coming in from
> >> workers.
>
> > But if they used different params, there could be different problems,
> > no? It's logically the same var.
>
> As far as I can think at the moment, there's no problem with having
> multiple nestloop Params referencing the "same" Var. It could be an
> impediment to optimization if it happened (much) earlier in the planner,
> but for the situation at hand the only code that's going to be looking
> at the tree is the executor and maybe ruleutils, both of which are much
> too stupid to be bothered by such aliasing.

The index scan does actually emit all the tuples it should in your
paragraph "(7)" (maybe why nobody ever noticed this problem before)
but in this case there is also an extra (redundant?) qual referencing
the param, so ExecScan()'s call to ExecQual() returns false after the
other Nest Loop tramples on it, and the tuples are filtered out (I
showed that as "dropped" in my printf-debugging excerpt up-thread).
We'd have to make sure that the qual references the param that is set
by this join and not its evil twin. I'm confused about how that and
any other references to the Var would work, but as you can probably
tell I don't have a great grip on the Var/param system and the
relevant optimisation phases yet.

Hmm. Why are those ExecQual() -> false cases not showing up as
variation in the "Rows Removed by Filter" counter visible in EXPLAIN
ANALYZE? Then we might have arrived here a lot faster.
InstrCountFiltered1(node, 1) is executed, but somehow the count
doesn't make it into the total shown by EXPLAIN.

> > This makes me wonder if we need
> > some kind of scheme for saving and restoring affected params whenever
> > Gather switches between executing the plan directly and emitting
> > tuples from workers, or something like that...
>
> I don't think we need to (or should) go there if this is the only
> problem. What's worrying me is what other assumptions based on serial
> plan execution are getting broken by injecting Gather into mid levels
> of a plan tree.

parallel_leader_participation = on is a many-headed serpentine beast.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-01-09 23:37:41 BUG #15583: PGDUMP windows binaries are out of date
Previous Message Andrew Dunstan 2019-01-09 23:24:33 Re: BUG #15446: Crash on ALTER TABLE