Re: Incrementally refreshed materialized view

From: hariprasath nallasamy <hariprasathnallasamy(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Adam Brusselback <adambrusselback(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Incrementally refreshed materialized view
Date: 2016-09-27 05:23:21
Message-ID: CAGgejVw9D8PMqd6qifDLmps6_JPV+9+Zm9WN14bAfmUPk==n5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We also tried to achieve incremental refresh of materialized view and our
solution doesn't solve all of the use cases.

Players:
1) WAL
2) Logical decoding
3) replication slots
4) custom background worker

Two kinds of approaches :
1. Deferred refresh (oracle type of creating log table for each base tables
with its PK and agg's columns old and new values)
a) Log table for each base table has to be created and this log table
will keep track of delta changes.
b) UDF is called to refresh the view incrementally - this will
run original materialized view query with the tracked delta PK's in their
where clause. so only rows that are modified/inserted will be touched.
c) Log table will keep track of changed rows from the data given by
replication slot which uses logical decoding to decode from WAL.
d) Shared memory is used to maintain the relationship between the
view and its base table. In case of restart they are pushed to maintenance
table.

2. RealTime refresh (update the view whenever we get any change-sets
related to that base tables)
a) Delta data from the replication slot will be applied to view by
checking the relationship between our delta data and the view definiton.
Here also shared memory and maintenance table are used.
b) Work completed only for materialized views having single table.

Main disadvantage :
1) Data inconsistency when master failure and also slave doesn't have
replication slot as of now. But 2ndquard guys try to create slots in slave
using some concepts of failover slots. But that doesn't come along with PG
:(.
2) Sum, count and avg are implemented for aggregates(single table) and for
other aggs full refresh comes to play a role.
3) Right join implementation requires more queries to run on the top of
MV's.

So we are on a long way to go and dono whether this is the right path.

Only deferred refresh was pushed to github.
https://github.com/harry-2016/MV_IncrementalRefresh

I wrote a post regarding that in medium.
https://medium.com/@hariprasathnallsamy/postgresql-materialized-view-incremental-refresh-44d1ca742599

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marek Petr 2016-09-27 05:27:28 lost synchronization with server: got message type "Z"
Previous Message raf 2016-09-27 05:13:50 Frequent "pg_ctl status" removing(?) semaphores (unlikely)