Re: Different results from identical matviews

From: Anders Steinlein <anders(at)e5r(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Different results from identical matviews
Date: 2020-07-02 21:32:48
Message-ID: CAC35HN=Nj0fpZ-i=LEb8JxN6RtD5rZoE+zjLGb+JdcajRNby=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jul 2, 2020 at 5:43 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Anders Steinlein <anders(at)e5r(dot)no> writes:
> >> Even that perhaps isn't conclusive, so you could
> >> also try comparing the pg_rewrite.ev_action fields for the views'
> >> ON SELECT rules. (That might be a bit frustrating because of likely
> >> inconsistencies in node "location" fields; but any other difference
> >> is cause for suspicion.)
>
> > You're right, ev_action is indeed different:
> > ...
> > Is there somehow I can format them to make it easier to compare? My basic
> > attempts didn't help me much. I put them up in all their glories in
> > pastebins, since they are rather large. Please let me know if there is
> > somehow I can make this easier to look into.
>
> Yeah, expression trees are pretty unreadable :-(. I downloaded these,
> changed all the "location" fields to -1 to make them more comparable,
> and behold there are still a bunch of diffs. Here's one:
>
> [...]
>
> This is the internal form of a "JOIN ... USING (email)" construct.
> I didn't try to trace this back to exactly where it was in the source
> queries. The important thing here is that we have a couple of Vars
> of type 106893, which I gather must be citext or a domain over it.
>

Yes, we have a domain called `email` over citext. The reason for the
multiple casts to citext, IIRC, was that i.e. array_agg() didn't accept the
email domain directly.

> In the first tree, those are coerced via a no-op RelabelType operation
> into plain text (type OID 25) and then compared with the built-in texteq
> operator. In the second tree, they are coerced to some other non-built-in
> type (maybe plain citext?) and then compared with operator 106108.
>
> I am betting that 106084 is citext, 106108 is citext's equality operator,
> and the net implication of all this is that the original matview is doing
> the JOIN using case-sensitive equality whereas the new one is using
> case-insensitive equality.
>

This makes sense. We've narrowed the different results down to this exact
case; that is, for the rows that are missing in the old matview, they have
"email" entries in different case between some of the tables. So
case-sensitive equality checks on these will naturally be lost somewhere.
How this came to be is another question...

A plausible explanation for how things got that way is that citext's
> equality operator wasn't in your search_path when you created the original
> matview, but it is in view when you make the new one, allowing that
> equality operator to capture the interpretation of USING.

Possibly, since this view has existed for many years. However in general,
our multi-tenant migration system does SET search_path TO <tenant>, public for
all DDL we do, so unless we had an issue whey back when it should've been
present.

> Unfortunately,
> since the reverse-listing of this join is just going to say "USING
> (email)", there's no way to detect from human-readable output that the
> interpretation of the USING clauses is different. (We've contemplated
> introducing not-SQL-standard syntax to allow flagging such cases, but
> haven't pulled the trigger on that.)
>

If I'm reading this correctly, would this be a "reason" to be more explicit
when doing joins involving non-standard data types? I.e. would it be
"safer" to do ON x1.email::citext == x2.email::citext instead of USING (if
there is any difference at all...)?

I count five places in the query with similar operator substitutions.
> There are some other diffs in the trees that are a bit odd, but might be
> explained if the new view was made by dump/reload rather than from the
> identical SQL text the original view was made from; they all look like
> they are references to JOIN output columns rather than the underlying
> table columns or vice versa. That's probably harmless, but the different
> join operators certainly are not.
>

I looked over the procedure we used for the upgrade, and it was this:
Postgres 9.4 server backup using WAL-E, restored into 9.4 on a new box,
where we did some data cleanup and upgraded extensions including citext.
Then used pg_dump from 12, directory format, and restored into 12. Since
that time no DDL or extension changes have been made on the tables/view
involved here. Could the citext upgrade have had any effect here, messing
with dependencies somehow?

Anyway, I think I'll just recreate these views for all tenants to be on the
safe side. Still curious about how this came to be, but since it's about 10
years of history in this database I guess it will be hard to figure
anything out for sure.

Thanks a lot for looking into this, let me know if there's any reason to
dig further.

Best,
-- a.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2020-07-02 21:44:53 Re: Different results from identical matviews
Previous Message Adrian Klaver 2020-07-02 16:56:33 Re: survey: psql syntax errors abort my transactions