Re: Implementing Incremental View Maintenance

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-11-24 10:11:38
Message-ID: 20201124191138.f57e0955622f09e333538d7b@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 24 Nov 2020 12:46:57 +0300
Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>
>
> On 24.11.2020 12:21, Yugo NAGATA wrote:
> >
> >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> >> It is still about 7 times slower than performance without incremental view.
> >> But now the gap is not so dramatic. And it seems to be clear that this
> >> exclusive lock on matview is real show stopper for concurrent updates.
> >> I do not know which race conditions and anomalies we can get if replace
> >> table-level lock with row-level lock here.
> > I explained it here:
> > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
> >
> > For example, suppose there is a view V = R*S that joins tables R and S,
> > and there are two concurrent transactions T1 which changes table R to R'
> > and T2 which changes S to S'. Without any lock, in READ COMMITTED mode,
> > V would be updated to R'*S in T1, and R*S' in T2, so it would cause
> > inconsistency. By locking the view V, transactions T1, T2 are processed
> > serially and this inconsistency can be avoided.
> >
> > Especially, suppose that tuple dR is inserted into R in T1, and dS is
> > inserted into S in T2, where dR and dS will be joined in according to
> > the view definition. In this situation, without any lock, the change of V is
> > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> > be included in the results. This inconsistency could not be resolved by
> > row-level lock.
> >
> >> But I think that this problem should be addressed in any case: single
> >> client update mode is very rare scenario.
> > This behavior is explained in rules.sgml like this:
> >
> > +<sect2>
> > +<title>Concurrent Transactions</title>
> > +<para>
> > + Suppose an <acronym>IMMV</acronym> is defined on two base tables and each
> > + table was modified in different a concurrent transaction simultaneously.
> > + In the transaction which was committed first, <acronym>IMMV</acronym> can
> > + be updated considering only the change which happened in this transaction.
> > + On the other hand, in order to update the view correctly in the transaction
> > + which was committed later, we need to know the changes occurred in
> > + both transactions. For this reason, <literal>ExclusiveLock</literal>
> > + is held on an <acronym>IMMV</acronym> immediately after a base table is
> > + modified in <literal>READ COMMITTED</literal> mode to make sure that
> > + the <acronym>IMMV</acronym> is updated in the latter transaction after
> > + the former transaction is committed. In <literal>REPEATABLE READ</literal>
> > + or <literal>SERIALIZABLE</literal> mode, an error is raised immediately
> > + if lock acquisition fails because any changes which occurred in
> > + other transactions are not be visible in these modes and
> > + <acronym>IMMV</acronym> cannot be updated correctly in such situations.
> > +</para>
> > +</sect2>
> >
> > Hoever, should we describe explicitly its impact on performance here?
> >
>
> Sorry, I didn't think much about this problem.
> But I think that it is very important to try to find some solution of
> the problem.
> The most obvious optimization is not to use exclusive table lock if view
> depends just on one table (contains no joins).
> Looks like there are no any anomalies in this case, are there?

Thank you for your suggestion! That makes sense.

> Yes, most analytic queries contain joins (just two queries among 22
> TPC-H  have no joins).
> So may be this optimization will not help much.

Yes, but if a user want to incrementally maintain only aggregate views on a large
table, like TPC-H Q1, it will be helpful. For this optimization, we have to only
check the number of RTE in the rtable list and it would be cheap.

> I wonder if it is possible to somehow use predicate locking mechanism of
> Postgres to avoid this anomalies without global lock?

You mean that, ,instead of using any table lock, if any possibility of the
anomaly is detected using predlock mechanism then abort the transaction?

I don't have concrete idea to implement it and know if it is possible yet,
but I think it is worth to consider this. Thanks.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-11-24 10:36:58 Re: LogwrtResult contended spinlock
Previous Message Anastasia Lubennikova 2020-11-24 10:04:58 Re: psql: add \si, \sm, \st and \sr functions to show CREATE commands for indexes, matviews, triggers and tables