Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Marc Cousin <cousinmarc(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8
Date: 2018-07-13 20:54:15
Message-ID: 10492.1531515255@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On 07/09/2018 11:34 AM, Tom Lane wrote:
>> I think the most practical way to deal with this probably is to change
>> the parser so that the lookup works by finding a default btree or hash
>> opclass rather than by looking for "=" by name. We've made similar
>> changes in the past to get rid of implicit dependencies on operator
>> names, but those efforts never reached IS [NOT] DISTINCT FROM.

> I agree with your approach, including backpatching. I guess we'll have
> to try to think of some scenario that backpatching would break. Maybe to
> minimize any effect it should fall back on a default opclass if the
> search for "=" fails? Dunno how practical that is.

I poked into this area for awhile, and it turns out to be even a
worse can of worms than I thought. I looked through gram.y and
parse_expr.c, and identified several distinct classes of issue.
(I'm not promising that I found everything.)

1. LIKE, ILIKE, SIMILAR TO are converted to unqualified operator names
"~~", "~~*", etc. One might reflexively propose forcing these expansions
to be qualified with pg_catalog, but I'm afraid that would break use-cases
involving non-core data types, where the desired operator might well not
be in pg_catalog. However, I think that there's not actually any new
dump/restore hazard here, because of the fact that ruleutils.c simply
prints the results as the internal operator names without trying to
convert back to LIKE et al. If the operator isn't in the search path
it will get schema-qualified, and all's well. So my inclination in this
category is to do nothing. There's certainly a hazard for naive writers
of SQL, who may not realize that LIKE entails a search-path-dependent
lookup; but that hazard is no worse than it was before we decided to
lock down dump/restore search paths.

2. BETWEEN and related constructs are converted to boolean expressions
involving unqualified operator names ">=" etc. As above, we are saved
by the fact that what will be dumped out is the expanded expression,
providing room to schema-qualify the selected operators if needed.
So again I'm inclined to leave this alone. (This will be something
to think about if we ever get around to re-implementing BETWEEN as a
unitary construct, though.)

3. IN and NOT IN are converted to "= ANY" or "<> ALL" with unqualified
operator names. This is almost like the above cases, except that
ruleutils.c sometimes chooses to print the result using "IN" rather than
with the actual operator name. We might have to back off that readability
hack in some places (though it seems like it'd be OK if what would be
printed is exactly unqualified "= ANY" or "<> ALL").

4. As per original report, IS [NOT] DISTINCT FROM, as well as NULLIF,
are interpreted by doing a binary operator lookup using unqualified "="
as the operator name. My original thought of replacing that by using
default operator classes to identify the comparison semantics doesn't
really work, because there is no requirement that the two input values be
of the same datatype. For example, "int_var IS DISTINCT FROM numeric_var"
is handled perfectly well at present, but I see no principled way to
figure out what to do with that just by reference to operator classes.
We might have little choice but to invent a schema-qualifiable variant
syntax so we can represent these constructs unambiguously in dumps.
(Ugly as that is, it does have the advantage that we'd not be risking
subtle changes in the semantics of these operations.)

5. Row comparison constructs, for example ROW(a,b) < ROW(c,d), are handled
by doing column-by-column lookups using the specified operator name; this
example devolves to what "a < c" and "b < d" would be interpreted as.
Although we support OPERATOR() syntax in these constructs, that doesn't
get us far: if the original input was not a schema-qualified operator
then it's entirely possible that the per-column operators were found in
different schemas, which we can't handle by writing OPERATOR() in the
middle. (There's already a comment in ruleutils.c's handling of
RowCompareExpr complaining about this.)

One idea for resolving that is to extend the OPERATOR syntax to allow
multiple operator names for row comparisons, along the lines of
ROW(a,b) OPERATOR(pg_catalog.<, public.<) ROW(c,d)
This seems probably to be not terribly hard, although back-patching it
wouldn't be appetizing.

6. The row comparison problem also manifests for multiple-column cases
of ANY_SUBLINK and ALL_SUBLINK, for instance
WHERE (a, b) IN (SELECT x, y FROM ...)
Again, ruleutils.c contributes to the issue by printing "IN" even
though it knows that that isn't 100% accurate. But I think that here
again, a possible fix is to allow writing
WHERE (a, b) OPERATOR(pg_catalog.=, public.=) ANY (SELECT x, y FROM ...)
so that the operators to use can be locked down exactly.

7. The implicit-comparison form of CASE also resolves comparisons by
doing lookups with an assumed operator name of "=". For instance
CASE x WHEN 4 THEN ... WHEN 5.0 THEN ... END
might well choose different comparison operators for the two WHEN
clauses. We could imagine printing this as
CASE WHEN x = 4 THEN ... WHEN x = 5.0 THEN ... END
and schema-qualifying the operators as needed, but that's really
totally unsatisfactory if the test expression is expensive or volatile.
So I'm not sure what to do about this, but it seems like any real
answer will again involve introducing new syntax.

So this is all pretty messy, but on the bright side, fixing it would allow
cleaning up some ancient squishy coding in ruleutils.c. It wouldn't be
controversial as just a v12 addition, perhaps ... but do we have a choice
about back-patching? Dump/restore failures are not good.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2018-07-13 21:17:30 Re: cannot restore schema with is not distinct from on hstore since PG 9.6.8
Previous Message Tom Lane 2018-07-13 17:22:00 Re: Problem with tupdesc in jsonb_to_recordset

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-07-13 21:08:45 Re: [PATCH] LLVM tuple deforming improvements
Previous Message Andrew Dunstan 2018-07-13 20:43:12 Re: Vacuum: allow usage of more than 1GB of work mem