Re: Bug (8.4beta): FailedAssertion("!(bms_is_subset(relids, qualscope))", File: "initsplan.c", Line: 915)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Stefan Huehner <stefan(at)huehner(dot)org>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug (8.4beta): FailedAssertion("!(bms_is_subset(relids, qualscope))", File: "initsplan.c", Line: 915)
Date: 2009-05-06 19:47:46
Message-ID: 24884.1241639266@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> I suppose the change I made here
> http://archives.postgresql.org/pgsql-committers/2009-04/msg00329.php
> to improve constant-join-qual handling is what is preventing the
> assertion failure, though I'm still not quite sure why (I'd better look
> closer to see if there is still some form of the bug lurking).

After tracing through this, that change isn't actually related at all.
The bug was introduced last June (so it's in 8.3.recent too) and was
fixed here:
http://archives.postgresql.org/pgsql-committers/2009-04/msg00203.php
The actual issue can be boiled down to approximately this:

select ... from a left join b on something
where false and exists(select 1 from c where something-else)

(IOW, the critical constant-false is actually the occurrence of
"ad_tab_id = to_number(1)" in the outer WHERE, not the one in the
EXISTS as I'd supposed.) What happens is that when
distribute_qual_to_rels is given the constant-false qual to process,
it correctly determines that this should be pushed to the top of the
join tree; which is not where it is syntactically, because at this point
we've rewritten the whole thing to

select ... from
(select ... from a left join b on something
where false)
semi join c on something-else

if you pretend that SQL has SEMI JOIN as a native join type.
So this means we execute these lines:

/* if not below outer join, push it to top of tree */
if (!below_outer_join)
relids =
get_relids_in_jointree((Node *) root->parse->jointree,
false);

after which, relids is no longer a subset of qualscope (since qualscope
is only a+b whereas the entire join tree is a+b+c). So if the
check_outerjoin_delay call a few lines further down happens to return
true, the "Assert(bms_is_subset(relids, qualscope))" after that will
fail.

It formerly was, and now is again, the case that check_outerjoin_delay
can only return true if it enlarges the relids set, which of course is
impossible if relids is already the whole join tree. So that's why the
bug wasn't seen before, even though the logic has been like this for a
long time. The patch I applied last June made it possible for
check_outerjoin_delay to return true if there were any outer join within
the passed-in relids set (which is why the LEFT JOIN is a critical part
of the example). While that patch turned out to be wrong, it seems like
this code is a bit on the fragile side. What I'm thinking is that when
we set relids to the whole jointree, we ought to simultaneously set
qualscope to the same value; that's only one extra assignment and it
will prevent any future recurrence of the crash if check_outerjoin_delay
gets changed again. This behavior is semantically sensible because what
this code is doing is pretending that the qual appeared at the top
syntactic level to begin with; if it had been there then qualscope
would match.

Kind of a long-winded explanation of what will be a one-line patch,
but there you have it.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dave Page 2009-05-06 19:56:03 Re: BUG #4785: Installation fails
Previous Message Peter Much 2009-05-06 19:41:22 WARNING: silent data corruption possible from PL/ruby 0.5.0 (after Ruby upgrade)