Re: Implementing Incremental View Maintenance

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-11-12 12:37:42
Message-ID: 89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.11.2020 14:53, Tatsuo Ishii wrote:
>> 1. Create pgbench database with scale 100.
>> pgbench speed at my desktop is about 10k TPS:
>>
>> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
>> tps = 10194.951827 (including connections establishing)
>>
>> 2. Then I created incremental materialized view:
>>
>> create incremental materialized view teller_sums as select
>> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on
>> a.bid=t.bid group by t.tid;
>> SELECT 1000
>> Time: 20805.230 ms (00:20.805)
>>
>> 20 second is reasonable time, comparable with time of database
>> initialization.
>>
>> Then obviously we see advantages of precalculated aggregates:
>>
>> postgres=# select * from teller_sums where tid=1;
>>  tid |  sum
>> -----+--------
>>    1 | -96427
>> (1 row)
>>
>> Time: 0.871 ms
>> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
>> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
>> ;
>>  tid |  sum
>> -----+--------
>>    1 | -96427
>> (1 row)
>>
>> Time: 915.508 ms
>>
>> Amazing. Almost 1000 times difference!
>>
>> 3. Run pgbench once again:
>>
>> Ooops! Now TPS are much lower:
>>
>> tps = 141.767347 (including connections establishing)
>>
>> Speed of updates is reduced more than 70 times!
>> Looks like we loose parallelism because almost the same result I get
>> with just one connection.
> How much TPS do you get if you execute pgbench -c 1 without
> incremental materialized view defined? If it's around 141 then we
> could surely confirm that the major bottle neck is locking contention.
>

My desktop has just 4 physical cores, so performance with one connection
is about 2k TPS:

pgbench -M prepared -N -c 1 -T 60 -P 1 postgres
tps = 1949.233532 (including connections establishing)

So there is still large gap (~14 times) between insert speed
with/without incremental view.
I did more investigations and found out that one pf the reasons of bad
performance in this case is lack of index on materialized view,
so update has to perform sequential scan through 1000 elements.

Well, creation of proper indexes for table is certainly responsibility
of DBA.
But users may not consider materialized view as normal table. So the
idea that index should
be explicitly created for materialized view seems to be not so obvious.
From the other side, implementation of materialized view knows which
index is needed for performing efficient incremental update.
I wonder if it can create such index itself implicitly or at least
produce notice with proposal to create such index.

In any case, after creation of index on tid column of materialized view,
pgbench speed is increased from 141 to 331 TPS
(more than two times). It is with single connection. But if I run
pgbench with 10 connections, then performance is even slightly slower:
289 TPS.

I looked throw your patch for exclusive table locks and found this
fragment in matview.c:

    /*
     * Wait for concurrent transactions which update this materialized
view at
     * READ COMMITED. This is needed to see changes committed in other
     * transactions. No wait and raise an error at REPEATABLE READ or
     * SERIALIZABLE to prevent update anomalies of matviews.
     * XXX: dead-lock is possible here.
     */
    if (!IsolationUsesXactSnapshot())
        LockRelationOid(matviewOid, ExclusiveLock);
    else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))

I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
It is still about 7 times slower than performance without incremental view.
But now the gap is not so dramatic. And it seems to be clear that this
exclusive lock on matview is real show stopper for concurrent updates.
I do not know which race conditions and anomalies we can get if replace
table-level lock with row-level lock here.
But I think that this problem should be addressed in any case: single
client update mode is very rare scenario.

I attached to this mail profile of pgbench workload with defined
incremental view (with index).
May be you will find it useful.

One more disappointing observation of materialized views (now
non-incremental).
Time of creation of non-incremental materialized view is about 18 seconds:

postgres=# create materialized view teller_avgs as select
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on
a.bid=t.bid group by t.tid;
SELECT 1000
Time: 17795.395 ms (00:17.795)

But refresh of such view takes 55 seconds:

postgres=# refresh materialized view teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 55500.381 ms (00:55.500)

And refresh time doesn't depend on amount of updates since last refresh:
I got almost the same time when I ran pgbench for one minute before
refresh and
when  two refreshes are performed subsequently.

Adding index doesn't help much in this case and concurrent refresh is
even slower:

postgres=# refresh materialized view concurrently teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 56981.772 ms (00:56.982)

So it seems to be more efficient to drop and recreate materialized view
rather than refresh it. At least in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
imv.svg image/svg+xml 3.9 MB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-11-12 12:58:02 Re: Refactor pg_rewind code and make it work against a standby
Previous Message Dilip Kumar 2020-11-12 11:58:55 Remove unused variable from SharedSort