> A perhaps less invasive idea is to discard any proposed mergeclauses
> that are redundant in this sense. This would still require some
> reshuffling of responsibility between select_mergejoin_clauses and
> the code in pathkeys.c, since right now select_mergejoin_clauses
> takes no account of that. However, I'm worried that that might result
> in planner failure on some FULL JOIN cases that work today, since we
> require all the join clauses to be mergejoinable for a FULL JOIN.
> People seem to complain when the planner fails, even for really
> stupid queries ;-). I think this would only be acceptable if we can
> prove that ignoring clauses that are redundant in this sense doesn't
> change the result --- which might be the case, but I'm not sure.
On further study, this doesn't seem nearly as bad as it looked. I had
hastily misread some of the existing code as assuming one-for-one
correspondence of mergeclauses and pathkeys, but actually it just
assumes that the pathkeys list contains at least one pathkey matching
each mergeclause. Redundancy between two pathkeys in a list is
eliminated by allowing adjacent mergeclauses to share the same pathkey.
So that part actually all works, and the only problem is when an
equivalence class is redundant-for-sorting in itself --- that is,
when one of the eclass members is a constant. So that explains why
we'd not seen it before; except in very odd corner cases, the old code
would have eliminated the mergejoinable clause anyway.
If we simply make select_mergejoin_clauses() reject a clause as
not-mergejoinable if either side is equated to a constant, then
the problems go away. We'd have a new problem if this meant that
we fail to do a FULL JOIN, but AFAICS that can't happen: a variable
coming from a full join can't be equivalenced to a constant, except
perhaps in a below_outer_join eclass, which isn't considered redundant
for sorting anyway.
Bottom line is that it just takes another half dozen lines of code
to fix this; attached is the core of the fix (there are some
uninteresting prototype changes and macro-moving as well).
> I think I can fix this in a day or so, but I now definitely feel that
> we'll need an RC2 :-(
I no longer think that about this problem, but we've still got some
nasty-looking issues in xml.c, plus Hannes Dorbath's unsolved reports
of GIST/GIN problems ...
regards, tom lane
+ * Insist that each side have a non-redundant eclass. This
+ * restriction is needed because various bits of the planner expect
+ * that each clause in a merge be associatable with some pathkey in a
+ * canonical pathkey list, but redundant eclasses can't appear in
+ * canonical sort orderings. (XXX it might be worth relaxing this,
+ * but not enough time to address it for 8.3.)
+ * Note: it would be bad if this condition failed for an otherwise
+ * mergejoinable FULL JOIN clause, since that would result in
+ * undesirable planner failure. I believe that is not possible
+ * however; a variable involved in a full join could only appear
+ * in below_outer_join eclasses, which aren't considered redundant.
+ * This *can* happen for left/right join clauses, however: the
+ * outer-side variable could be equated to a constant. (Because we
+ * will propagate that constant across the join clause, the loss of
+ * ability to do a mergejoin is not really all that big a deal, and
+ * so it's not clear that improving this is important.)
+ cache_mergeclause_eclasses(root, restrictinfo);
+ if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+ have_nonmergeable_joinclause = true;
+ continue; /* can't handle redundant eclasses */
result_list = lappend(result_list, restrictinfo);
In response to
pgsql-hackers by date
|Next:||From: Tim Yardley||Date: 2008-01-09 19:20:18|
|Subject: tzdata issue on cross-compiled postgresql|
|Previous:||From: Gregory Stark||Date: 2008-01-09 19:05:07|
|Subject: Re: OUTER JOIN performance regression remains in 8.3beta4|