Re: Implementing Incremental View Maintenance

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, takuma(dot)hoshiai(at)gmail(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-09-05 05:56:18
Message-ID: CA+hUKG+2ze0_kcZt9AAUDufP0cNL+jtS5BsTmys3c-9S3Ecfng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nagata-san,

On Mon, Aug 31, 2020 at 5:32 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

Thanks for writing this!

+ /*
+ * Wait for concurrent transactions which update this materialized view at
+ * READ COMMITED. This is needed to see changes committed in other
+ * transactions. No wait and raise an error at REPEATABLE READ or
+ * SERIALIZABLE to prevent update anomalies of matviews.
+ * XXX: dead-lock is possible here.
+ */
+ if (!IsolationUsesXactSnapshot())
+ LockRelationOid(matviewOid, ExclusiveLock);
+ else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))

Could you please say a bit more about your plans for concurrency control?

Simple hand-crafted "rollup" triggers typically conflict only when
modifying the same output rows due to update/insert conflicts, or
perhaps some explicit row level locking if they're doing something
complex (unfortunately, they also very often have concurrency
bugs...). In some initial reading about MV maintenance I did today in
the hope of understanding some more context for this very impressive
but rather intimidating patch set, I gained the impression that
aggregate-row locking granularity is assumed as a baseline for eager
incremental aggregate maintenance. I understand that our
MVCC/snapshot scheme introduces extra problems, but I'm wondering if
these problems can be solved using the usual update semantics (the
EvalPlanQual mechanism), and perhaps also some UPSERT logic. Why is
it not sufficient to have locked all the base table rows that you have
modified, captured the before-and-after values generated by those
updates, and also locked all the IMV aggregate rows you will read, and
in the process acquired a view of the latest committed state of the
IMV aggregate rows you will modify (possibly having waited first)? In
other words, what other data do you look at, while computing the
incremental update, that might suffer from anomalies because of
snapshots and concurrency? For one thing, I am aware that unique
indexes for groups would probably be necessary; perhaps some subtle
problems of the sort usually solved with predicate locks lurk there?

(Newer papers describe locking schemes that avoid even aggregate-row
level conflicts, by taking advantage of the associativity and
commutativity of aggregates like SUM and COUNT. You can allow N
writers to update the aggregate concurrently, and if any transaction
has to roll back it subtracts what it added, not necessarily restoring
the original value, so that nobody conflicts with anyone else, or
something like that... Contemplating an MVCC, no-rollbacks version of
that sort of thing leads to ideas like, I dunno, update chains
containing differential update trees to be compacted later... egad!)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-09-05 06:14:14 Use <unnamed> for name of unnamed portal's memory context
Previous Message Jesse Zhang 2020-09-05 04:23:12 Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur