Re: Implementing Incremental View Maintenance

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-11-11 16:10:35
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.10.2020 12:16, Yugo NAGATA wrote:
> Hi,
> Attached is the rebased patch (v18) to add support for Incremental
> Materialized View Maintenance (IVM). It is able to be applied to
> current latest master branch.

Thank you very much for this work.
I consider incremental materialized views as "reincarnation" of OLAP
There are two approaches of making OLAP queries faster:
1. speed up query execution (using JIT, columnar store, vector
operations and parallel execution)
2. precalculate requested data

Incremental materialize views make it possible to implement second
approach. But how competitive it is?
I do not know current limitations of incremental materialized views, but
I checked that basic OLAP functionality:

The patch is not applied to the current master because makeFuncCall
prototype is changed,
I fixed it by adding COAERCE_CALL_EXPLICIT.
Then I did the following simple test:

1. Create pgbench database with scale 100.
pgbench speed at my desktop is about 10k TPS:

pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
tps = 10194.951827 (including connections establishing)

2. Then I created incremental materialized view:

create incremental materialized view teller_sums as select
t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on group by t.tid;
Time: 20805.230 ms (00:20.805)

20 second is reasonable time, comparable with time of database

Then obviously we see advantages of precalculated aggregates:

postgres=# select * from teller_sums where tid=1;
 tid |  sum
   1 | -96427
(1 row)

Time: 0.871 ms
postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
pgbench_tellers t on group by t.tid having t.tid=1
 tid |  sum
   1 | -96427
(1 row)

Time: 915.508 ms

Amazing. Almost 1000 times difference!

3. Run pgbench once again:

Ooops! Now TPS are much lower:

tps = 141.767347 (including connections establishing)

Speed of updates is reduced more than 70 times!
Looks like we loose parallelism because almost the same result I get
with just one connection.

4. Finally let's create one more view (it is reasonable to expect that
analytics will run many different queries and so need multiple views).

create incremental materialized view teller_avgs as select
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on group by t.tid;

It is great that not only simple aggregates like SUM are supported, but
also AVG.
But insertion speed now is reduced twice - 72TPS.

I tried to make some profiling but didn't see something unusual:

  16.41%  postgres  postgres            [.] ExecInterpExpr
   8.78%  postgres  postgres            [.] slot_deform_heap_tuple
   3.23%  postgres  postgres            [.] ExecMaterial
   2.71%  postgres  postgres            [.] AllocSetCheck
   2.33%  postgres  postgres            [.] AllocSetAlloc
   2.29%  postgres  postgres            [.] slot_getsomeattrs_int
   2.26%  postgres  postgres            [.] ExecNestLoop
   2.11%  postgres  postgres            [.] MemoryContextReset
   1.98%  postgres  postgres            [.] tts_minimal_store_tuple
   1.87%  postgres  postgres            [.] heap_compute_data_size
   1.78%  postgres  postgres            [.] fill_val
   1.56%  postgres  postgres            [.] tuplestore_gettuple
   1.44%  postgres  postgres            [.] sentinel_ok
   1.35%  postgres  postgres            [.] heap_fill_tuple
   1.27%  postgres  postgres            [.] tuplestore_gettupleslot
   1.17%  postgres  postgres            [.] ExecQual
   1.14%  postgres  postgres            [.] tts_minimal_clear
   1.13%  postgres  postgres            [.] CheckOpSlotCompatibility
   1.10%  postgres  postgres            [.] base_yyparse
   1.10%  postgres  postgres            [.] heapgetpage
   1.04%  postgres  postgres            [.] heap_form_minimal_tuple
   1.00%  postgres  postgres            [.] slot_getsomeattrs

So good news is that incremental materialized views really work.
And bad news is that maintenance overhead is too large which
significantly restrict applicability of this approach.
Certainly in case of dominated read-only workload such materialized
views can significantly improve performance.
But unfortunately my dream that them allow to combine OLAP+OLPT is not
currently realized.

Konstantin Knizhnik
Postgres Professional:
The Russian Postgres Company

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2020-11-11 16:11:38 Re: Prevent printing "next step instructions" in initdb and pg_upgrade
Previous Message Tim.Colles 2020-11-11 16:04:37 RE: POC: postgres_fdw insert batching