Fixing Grittner's planner issues

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Fixing Grittner's planner issues
Date: 2009-02-05 17:05:01
Message-ID: 21139.1233853501@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the planning problems in HEAD discussed here:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00120.php
As I suspected, there are really two independent bugs there,
though both are in the new semi/antijoin planning code.

The first problem is the one already mentioned: cost_mergejoin fails to
penalize semi/anti joins for nonunique merge keys, although nonunique
keys require rescanning of parts of the inner relation and can result in
huge runtime increases. After unsuccessfully fooling with some quick
and dirty fixes, I remembered why I'd tried to dodge the problem: the
correct way to estimate the amount of rescanning done involves estimating
the merge condition selectivity as if it were a plain inner join
condition, not a semi or anti join. We have code to do that, certainly,
but this means that we will sometimes demand the selectivity of the same
RestrictInfo under JOIN_INNER rules and sometimes under SEMI or ANTI
rules. And there's only space to cache one selectivity there. And
we *really* need to cache both because cost_mergejoin tends to get done
a large number of times in a complex join problem.

The only good solution I can see is to add another selectivity cache
field to RestrictInfo so that we can cache both the JOIN_INNER
selectivity and the "native" selectivity of the qual's context. This
is slightly annoying from a space consumption point of view, but it's
not fatal.

The second problem is specific to the particular structure of Kevin's
query:

SELECT ... FROM
"Case" "C"
LEFT OUTER JOIN "CaseDispo" "CD"
ON ("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo")
AND (NOT (EXISTS (SELECT 1 FROM "CaseDispo" "CD2"
WHERE ("CD2"."caseNo" = "CD"."caseNo")
AND ("CD2"."countyNo" = "CD"."countyNo")
AND ("CD2"."dispoDate" > "CD"."dispoDate"))))
WHERE some-clause-that-selects-just-a-few-C-rows

that is, the EXISTS clause is part of the ON condition of an outer join.
If it referred to any variables of the left side of the upper join
(ie, "C" here) then we couldn't convert it to a separate join at all.
Since it refers only to "CD", it's semantically legitimate to push it
down into the right-hand side of the outer join, and then we can convert
it to a lower outer join. The problem is that having done so, the
resulting anti join doesn't commute with the upper outer join. So
we're forced to evaluate it first, and in this particular example this
results in a much worse plan than leaving it as a SubLink would produce.
A SubLink isn't evaluated "after" the upper join, exactly, but from a
performance point of view we get that effect because it only gets
evaluated after all the other join conditions have succeeded for a
particular pair of C and CD rows. In this case we only need the answer
for a few C/CD rows so that's a win. (If we needed the answer for most
CD rows, it's not a win; but I don't think we can optimize that case at
the expense of getting miserably slower for cases we used to be good at.)

I think the only fix for this that's practical in the 8.4 time frame is
to give up trying to flatten EXISTS/NOT EXISTS that occur within the ON
condition of an outer join, ie, revert to 8.3's level of intelligence
about this case. It seems like a general purpose solution would involve
somehow being able to separate the semantic effects of an outer join
(in particular, null-insertion) from the time at which it's performed,
and I've really got no idea how to do that or what consequences it would
have for both the planner and the executor.

Reflecting on this further, I suspect there are also some bugs in the
planner's rules about when semi/antijoins can commute with other joins;
but that's not directly causing Kevin's problem, because the rules do
make the correct decision for this case.

So I've got a few "must fix" items here, but before I go off to start
doing that, I wondered if anyone had any comments or better ideas.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-02-05 17:29:52 Re: Fixing Grittner's planner issues
Previous Message Bruce Momjian 2009-02-05 15:26:04 Re: BUG #4516: FOUND variable does not work after RETURN QUERY