Re: Implementing Incremental View Maintenance

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com, hoshiai(at)sraoss(dot)co(dot)jp, kgrittn(at)gmail(dot)com
Subject: Re: Implementing Incremental View Maintenance
Date: 2019-09-30 13:34:14
Message-ID: 20190930223414.62e45861094e8ee68c2d0cbb@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is the latest patch for supporting self-join views. This also
including the following fix mentioned by Tatsuo Ishii.

> > On 2019-Aug-06, Tatsuo Ishii wrote:
> >
> >> It's not mentioned below but some bugs including seg fault when
> >> --enable-casser is enabled was also fixed in this patch.
> >>
> >> BTW, I found a bug with min/max support in this patch and I believe
> >> Yugo is working on it. Details:
> >> https://github.com/sraoss/pgsql-ivm/issues/20

This patch allows to support self-join views, simultaneous updates of more
than one base tables, and also multiple updates of the same base table.
I first tried to support just self-join, but I found that this is essentially
same as to support simultaneous table updates, so I decided to support them in
the same commit. I think this will be a base for implementing
Deferred-maintenance in future.

In the new implementation, AFTER triggers are used to collecting tuplestores
containing transition table contents. When multiple tables are changed,
multiple AFTER triggers are invoked, then the final AFTER trigger performs
actual update of the matview. In addition AFTER trigger, also BEFORE trigger
is used to handle global information for view maintenance.

For example, suppose that we have a view V joining table R,S, and new tuples are
inserted to each table, dR,dS, and dT respectively.

V = R*S*T
R_new = R + dR
S_new = S + dS
T_new = T + dT

In this situation, we can calculate the new view state as bellow.

V_new
= R_new * S_new * T_new
= (R + dR) * (S + dS) * (T + dT)
= R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + (dR *S_new*T_new) + (R*dS*T_new) + (R*S*dT)

To calculate view deltas, we need both pre-state (R,S, and T) and post-state
(R_new, S_new, and T_new) of base tables.

Post-update states are available in AFTER trigger, and we calculate pre-update
states by filtering inserted tuples using cmin/xmin system columns, and appendding
deleted tuples which are contained in a old transition table.

In the original core implementation, tuplestores of transition tables were
freed for each query depth. However, we want to prolong their life plan because
we have to preserve these for a whole query assuming some base tables are changed
in other trigger functions, so I added a hack to trigger.c.

Regression tests are also added for self join view, multiple change on the same
table, simultaneous two table changes, and foreign reference constrains.

Here are behavior examples:

1. Table definition
- t: for self-join
- r,s: for 2-ways join

CREATE TABLE r (i int, v int);
CREATE TABLE
CREATE TABLE s (i int, v int);
CREATE TABLE
CREATE TABLE t (i int, v int);
CREATE TABLE

2. Initial data

INSERT INTO r VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3
INSERT INTO s VALUES (1, 100), (2, 200), (3, 300);
INSERT 0 3
INSERT INTO t VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3

3. View definition

3.1. self-join(mv_self, v_slef)

CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS
SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
SELECT 3
CREATE VIEW v_self(v1, v2) AS
SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
CREATE VIEW

3.2. 2-ways join (mv, v)

CREATE INCREMENTAL MATERIALIZED VIEW mv(v1, v2) AS
SELECT r.v, s.v FROM r JOIN s USING(i);
SELECT 3
CREATE VIEW v(v1, v2) AS
SELECT r.v, s.v FROM r JOIN s USING(i);
CREATE VIEW

3.3 Initial contents

SELECT * FROM mv_self ORDER BY v1;
v1 | v2
----+----
10 | 10
20 | 20
30 | 30
(3 rows)

SELECT * FROM mv ORDER BY v1;
v1 | v2
----+-----
10 | 100
20 | 200
30 | 300
(3 rows)

4. Update a base table for the self-join view

INSERT INTO t VALUES (4,40);
INSERT 0 1
DELETE FROM t WHERE i = 1;
DELETE 1
UPDATE t SET v = v*10 WHERE i=2;
UPDATE 1

4.1. Results
- Comparison with the normal view

SELECT * FROM mv_self ORDER BY v1;
v1 | v2
-----+-----
30 | 30
40 | 40
200 | 200
(3 rows)

SELECT * FROM v_self ORDER BY v1;
v1 | v2
-----+-----
30 | 30
40 | 40
200 | 200
(3 rows)

5. pdate a base table for the 2-way join view

WITH
ins_r AS (INSERT INTO r VALUES (1,11) RETURNING 1),
ins_r2 AS (INSERT INTO r VALUES (3,33) RETURNING 1),
ins_s AS (INSERT INTO s VALUES (2,222) RETURNING 1),
upd_r AS (UPDATE r SET v = v + 1000 WHERE i = 2 RETURNING 1),
dlt_s AS (DELETE FROM s WHERE i = 3 RETURNING 1)
SELECT NULL;
?column?
----------

(1 row)

5.1. Results
- Comparison with the normal view

SELECT * FROM mv ORDER BY v1;
v1 | v2
------+-----
10 | 100
11 | 100
1020 | 200
1020 | 222
(4 rows)

SELECT * FROM v ORDER BY v1;
v1 | v2
------+-----
10 | 100
11 | 100
1020 | 200
1020 | 222
(4 rows)

========

Best Regards,
Yugo Nagata

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
WIP_immediate_IVM_v6.patch text/x-diff 121.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-09-30 13:37:49 Re: SSL tests failing for channel_binding with OpenSSL <= 1.0.1
Previous Message Tomas Vondra 2019-09-30 12:49:44 Re: Online checksums patch - once again