Re: ERROR: left and right pathkeys do not match in mergejoin

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ERROR: left and right pathkeys do not match in mergejoin
Date: 2018-02-22 23:59:58
Message-ID: 754.1519343998@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
>> The third possibility is to decide that create_mergejoin_plan is being
>> overly paranoid and it's okay to extract merge details from a "redundant"
>> path key even though it specifies the opposite sort order from what the
>> current merge clause seems to need.

> This check looks necessary to me, because merge join can't work with
> such a combination of outer pathkeys and merge clauses.

No, I think it's okay, for exactly the same reason that the pathkey got
thrown away as redundant in the first place. That is, imagine that we
are mergejoining on "o.a = i.x AND o.b = i.x", and we know that the sort
ordering of the outer relation is "a ASC, b DESC". It would appear that
this means the inner relation needs to be ordered like "x ASC, x DESC",
and certainly if we were to play dumb and sort it that way, you'd expect
everything to work. The reason that we're having trouble is that the
pathkey machinery knows that "x ASC, x DESC" is stupid, and it throws
away the second column. It would not matter whether we considered the
sort ordering of the inner relation to be "x ASC, x DESC", or just
"x ASC", or indeed "x ASC, x ASC": all of those describe exactly the same
row ordering, because we only get to comparing the second sort column for
rows having equal values in the first sort column, and then our conclusion
must still be that the rows' values are equal.

This logic still goes through if the sort column contents are not simply
duplicate variables but distinct variables that have gotten put into the
same eclass, that is "SELECT ... WHERE x = y ORDER BY x ASC, y DESC".
Equal is equal. (We go to some pains to be sure this is true, ie that
the equality operator's semantics agree with the ordering operator's.)

So essentially, we can lie to the executor and say that we sorted the
inner rel as "x ASC, x DESC", even though we really did no such thing,
because there will be no difference in the actual input row order from
what it would be if we had done that.

Attached is a completed patch that fixes this and also deals with the
overall topic that the inner and outer pathkeys aren't so interchangeable
as some parts of the code thought. Maybe the comments could use further
improvement but I think it's OK otherwise. I haven't started on
back-porting this, but the bugs are demonstrable back to 8.3, so that
needs to be done.

regards, tom lane

Attachment Content-Type Size
fix-mergejoin-redundant-pathkey-issues.patch text/x-diff 27.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-02-23 00:15:05 Re: Translations contributions urgently needed
Previous Message Alvaro Herrera 2018-02-22 23:52:12 Re: FOR EACH ROW triggers on partitioned tables