Better management of mergejoinable operators

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Better management of mergejoinable operators
Date: 2006-12-12 22:56:36
Message-ID: 27157.1165964196@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed today that process_implied_equality() still contains an ugly
hack that should have been got rid of awhile ago: it assumes that
mergejoinable operators are named "=". This has been a bogus assumption
for several releases, as illustrated by this failure:

regression=# select * from text_tbl a,text_tbl b,text_tbl c where a.f1 ~=~ b.f1 and b.f1 ~=~ c.f1;
ERROR: equality operator for types text and text should be merge-joinable, but isn't

It can also be fooled by schema-search-path issues, if the needed
operator exists but isn't in the path. Since we've not heard complaints
from the field about this, I'm not feeling urgent about having a
back-patchable solution, but I want to find one going forward.

What is actually needed in this function is to be able to find a
mergejoinable equality operator whose oprlsortop and oprrsortop are the
two sortops already known for the input pathkey columns. We have a
couple of problems doing that though: first, with the present system
catalog layout there seems no way to do that short of a seqscan through
all of pg_operator; and second, what if there's not a unique answer,
ie, multiple equality operators alleging the same lsortop/rsortop?

Right offhand I cannot see a reason why there should be different
equality operators with the same sortops. (If anyone can come up with
a plausible scenario for that, stop me here...) So what I'm thinking
about is a unique index on oprlsortop/oprrsortop; that would both allow
efficient search, and prevent multiple answers.

Now we can't do that directly because most of the entries in pg_operator
in fact contain zeroes in these columns, and would cause uniqueness
failures. Probably the cleanest answer would be to allow these two
columns to be NULL, not zero, when not meaningful; but that would be a
bit of a mess to implement because of the code's assumption of fixed
layout for pg_operator tuples.

What I'm considering doing is moving the oprlsortop/oprrsortop/
oprltcmpop/oprgtcmpop fields out of pg_operator and into a new auxiliary
catalog, named say pg_mergejoinop, that would have entries only for
mergejoinable equality operators. This would have the same kind of
relationship to pg_operator that pg_aggregate has to pg_proc: if a
pg_operator entry has "oprcanmerge" true, then there's an extension
row for it in pg_mergejoinop. The catalog would be fairly small and
cheap to search (48 entries in a default install, as of CVS head),
and could support a unique index to constrain the oprlsortop/oprrsortop
columns.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-12-12 23:02:12 Re: psql commandline conninfo
Previous Message Martijn van Oosterhout 2006-12-12 22:52:29 Re: psql commandline conninfo