Re: Implementing Incremental View Maintenance

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-11-24 09:21:33
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 12 Nov 2020 15:37:42 +0300
Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> Well, creation of proper indexes for table is certainly responsibility
> of DBA.
> But users may not consider materialized view as normal table. So the
> idea that index should
> be explicitly created for materialized view seems to be not so obvious.
> From the other side, implementation of materialized view knows which
> index is needed for performing efficient incremental update.
> I wonder if it can create such index itself implicitly or at least
> produce notice with proposal to create such index.

That makes sense. Especially for aggregate views, it is obvious that
creating an index on expressions used in GROUP BY is effective. For
other views, creating an index on columns that come from primary keys
of base tables would be effective if any.

However, if any base table doesn't have a primary or unique key or such
key column is not contained in the view's target list, it is hard to
decide an appropriate index on the view. We can create an index on all
columns in the target list, but it could cause overhead on view maintenance.
So, just producing notice would be better for such cases.

> I looked throw your patch for exclusive table locks and found this
> fragment in matview.c:
>     /*
>      * Wait for concurrent transactions which update this materialized
> view at
>      * READ COMMITED. This is needed to see changes committed in other
>      * transactions. No wait and raise an error at REPEATABLE READ or
>      * SERIALIZABLE to prevent update anomalies of matviews.
>      * XXX: dead-lock is possible here.
>      */
>     if (!IsolationUsesXactSnapshot())
>         LockRelationOid(matviewOid, ExclusiveLock);
>     else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))
> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> It is still about 7 times slower than performance without incremental view.
> But now the gap is not so dramatic. And it seems to be clear that this
> exclusive lock on matview is real show stopper for concurrent updates.
> I do not know which race conditions and anomalies we can get if replace
> table-level lock with row-level lock here.

I explained it here:

For example, suppose there is a view V = R*S that joins tables R and S,
and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock, in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency. By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results. This inconsistency could not be resolved by
row-level lock.

> But I think that this problem should be addressed in any case: single
> client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+<title>Concurrent Transactions</title>
+ Suppose an <acronym>IMMV</acronym> is defined on two base tables and each
+ table was modified in different a concurrent transaction simultaneously.
+ In the transaction which was committed first, <acronym>IMMV</acronym> can
+ be updated considering only the change which happened in this transaction.
+ On the other hand, in order to update the view correctly in the transaction
+ which was committed later, we need to know the changes occurred in
+ both transactions. For this reason, <literal>ExclusiveLock</literal>
+ is held on an <acronym>IMMV</acronym> immediately after a base table is
+ modified in <literal>READ COMMITTED</literal> mode to make sure that
+ the <acronym>IMMV</acronym> is updated in the latter transaction after
+ the former transaction is committed. In <literal>REPEATABLE READ</literal>
+ or <literal>SERIALIZABLE</literal> mode, an error is raised immediately
+ if lock acquisition fails because any changes which occurred in
+ other transactions are not be visible in these modes and
+ <acronym>IMMV</acronym> cannot be updated correctly in such situations.

Hoever, should we describe explicitly its impact on performance here?

> I attached to this mail profile of pgbench workload with defined
> incremental view (with index).
> May be you will find it useful.

Thank you for your profiling! Hmm, it shows that overhead of executing
query for calculating the delta (refresh_mateview_datfill) and applying
the delta (SPI_exec) is large.... I will investigate if more optimizations
to reduce the overhead is possible.

> One more disappointing observation of materialized views (now
> non-incremental).
> Time of creation of non-incremental materialized view is about 18 seconds:
> postgres=# create materialized view teller_avgs as select
> t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on
> group by t.tid;
> SELECT 1000
> Time: 17795.395 ms (00:17.795)
> But refresh of such view takes 55 seconds:
> postgres=# refresh materialized view teller_avgs;
> Time: 55500.381 ms (00:55.500)

Hmm, interesting... I would like to investigate this issue, too.

Yugo Nagata

Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-11-24 09:46:57 Re: Implementing Incremental View Maintenance
Previous Message Yugo NAGATA 2020-11-24 09:21:14 Re: Implementing Incremental View Maintenance