Fixing some issues in matview.c's refresh-query construction

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Fixing some issues in matview.c's refresh-query construction
Date: 2018-03-18 22:47:07
Message-ID: 13836.1521413227@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While looking at the performance problem Jeff Janes reported recently[1],
I noticed several other pre-existing deficiencies in the way that
refresh_by_match_merge() generates its query for constructing the diff
table:

1. It doesn't require the selected unique index(es) to be indimmediate.
Perhaps this isn't a bug, but I don't really see why: if they're
deferred constraints, then they might not represent conditions that
actually hold right now.

2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass. The datatype might not even *have* a
default btree opclass. We must instead look up, and use, the equality
operator associated with the index's actual opclass.

3. It doesn't check that the indexes are btrees. Even though there are no
other unique-capable index AMs today (at least not in core), I think it's
a good idea to insist on btrees. We couldn't be sure that the planner
could implement FULL JOIN with an operator that is not btree equality, nor
do we have a convention for identifying what is the equality operator for
an index opclass if it's not btree.

4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query. It's not enough to use
OPERATOR(schema.op) notation; you have to also be careful to cast the
inputs if they're not already of the operator's input types. The correct
way to do this is the way that the ri_triggers.c code does it. (This
would have been a security bug before CVE-2018-1058.)

5. While not actually a bug, I didn't like the fact that the conditions
for an index being usable for the match merge were written out in two
places. That's too easy to break. I also didn't like the fact that
refresh_by_match_merge() would segfault if it came across an index on a
system column, such as the OID column. That's just latent, since matviews
can't have OID columns today, but we really ought to check that we are
considering only indexes on user columns.

The attached patch rectifies these problems. Rather than duplicating
ri_triggers.c's code for emitting a safe equality clause, I thought the
best way to proceed was to have ruleutils.c export it for use by the
other modules. That's where we have most query-deconstruction code,
so just exporting it from ri_triggers.c seemed a bit too historical.

Barring objections, I propose to back-patch this as a bug fix.

regards, tom lane

[1] https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com

Attachment Content-Type Size
fix-corner-case-matview-refresh-issues-1.patch text/x-diff 20.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2018-03-18 22:52:30 Re: MCV lists for highly skewed distributions
Previous Message Chapman Flack 2018-03-18 21:54:22 Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility