Re: Problem with CVS HEAD's handling of mergejoins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Problem with CVS HEAD's handling of mergejoins
Date: 2008-01-09 19:13:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> A perhaps less invasive idea is to discard any proposed mergeclauses
> that are redundant in this sense. This would still require some
> reshuffling of responsibility between select_mergejoin_clauses and
> the code in pathkeys.c, since right now select_mergejoin_clauses
> takes no account of that. However, I'm worried that that might result
> in planner failure on some FULL JOIN cases that work today, since we
> require all the join clauses to be mergejoinable for a FULL JOIN.
> People seem to complain when the planner fails, even for really
> stupid queries ;-). I think this would only be acceptable if we can
> prove that ignoring clauses that are redundant in this sense doesn't
> change the result --- which might be the case, but I'm not sure.

On further study, this doesn't seem nearly as bad as it looked. I had
hastily misread some of the existing code as assuming one-for-one
correspondence of mergeclauses and pathkeys, but actually it just
assumes that the pathkeys list contains at least one pathkey matching
each mergeclause. Redundancy between two pathkeys in a list is
eliminated by allowing adjacent mergeclauses to share the same pathkey.
So that part actually all works, and the only problem is when an
equivalence class is redundant-for-sorting in itself --- that is,
when one of the eclass members is a constant. So that explains why
we'd not seen it before; except in very odd corner cases, the old code
would have eliminated the mergejoinable clause anyway.

If we simply make select_mergejoin_clauses() reject a clause as
not-mergejoinable if either side is equated to a constant, then
the problems go away. We'd have a new problem if this meant that
we fail to do a FULL JOIN, but AFAICS that can't happen: a variable
coming from a full join can't be equivalenced to a constant, except
perhaps in a below_outer_join eclass, which isn't considered redundant
for sorting anyway.

Bottom line is that it just takes another half dozen lines of code
to fix this; attached is the core of the fix (there are some
uninteresting prototype changes and macro-moving as well).

> I think I can fix this in a day or so, but I now definitely feel that
> we'll need an RC2 :-(

I no longer think that about this problem, but we've still got some
nasty-looking issues in xml.c, plus Hannes Dorbath's unsolved reports
of GIST/GIN problems ...

regards, tom lane

+ /*
+ * Insist that each side have a non-redundant eclass. This
+ * restriction is needed because various bits of the planner expect
+ * that each clause in a merge be associatable with some pathkey in a
+ * canonical pathkey list, but redundant eclasses can't appear in
+ * canonical sort orderings. (XXX it might be worth relaxing this,
+ * but not enough time to address it for 8.3.)
+ *
+ * Note: it would be bad if this condition failed for an otherwise
+ * mergejoinable FULL JOIN clause, since that would result in
+ * undesirable planner failure. I believe that is not possible
+ * however; a variable involved in a full join could only appear
+ * in below_outer_join eclasses, which aren't considered redundant.
+ *
+ * This *can* happen for left/right join clauses, however: the
+ * outer-side variable could be equated to a constant. (Because we
+ * will propagate that constant across the join clause, the loss of
+ * ability to do a mergejoin is not really all that big a deal, and
+ * so it's not clear that improving this is important.)
+ */
+ cache_mergeclause_eclasses(root, restrictinfo);
+ if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+ EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+ {
+ have_nonmergeable_joinclause = true;
+ continue; /* can't handle redundant eclasses */
+ }
result_list = lappend(result_list, restrictinfo);

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Yardley 2008-01-09 19:20:18 tzdata issue on cross-compiled postgresql
Previous Message Gregory Stark 2008-01-09 19:05:07 Re: OUTER JOIN performance regression remains in 8.3beta4