Re: Implementing Incremental View Maintenance

From: nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com>
To: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Implementing Incremental View Maintenance
Date: 2019-12-04 12:18:02
Message-ID: CAF3Gu1YyBSXT8rjJJK_MqNS02An=q6U8sCDR0PRUCaMwzocyNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO".
If a view created with "CREATE INCREMENT MATERIALIZED VIEW" is renamed,
subsequent INSERT operations to the base table will fail.

Error message.
```
ERROR: could not open relation with OID 0
```

Execution log.
```
[ec2-user(at)ip-10-0-1-10 ivm]$ psql -U postgres test -e -f
~/test/ivm/alter_rename_bug.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE: drop cascades
to materialized view group_imv
DROP TABLE
CREATE TABLE table_x AS
SELECT generate_series(1, 10000) AS id,
ROUND(random()::numeric * 100, 2) AS data,
CASE (random() * 5)::integer
WHEN 4 THEN 'group-a'
WHEN 3 THEN 'group-b'
ELSE 'group-c'
END AS part_key
;
SELECT 10000
Table "public.table_x"
Column | Type | Collation | Nullable | Default
----------+---------+-----------+----------+---------
id | integer | | |
data | numeric | | |
part_key | text | | |

DROP MATERIALIZED VIEW IF EXISTS group_imv;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE: materialized
view "group_imv" does not exist, skipping
DROP MATERIALIZED VIEW
CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS
SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data)
FROM table_x
GROUP BY part_key;
SELECT 3
List of relations
Schema | Name | Type | Owner
--------+-----------+-------------------+----------
public | group_imv | materialized view | postgres
public | table_x | table | postgres
(2 rows)

Materialized view "public.group_imv"
Column | Type | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
part_key | text | | |
count | bigint | | |
max | numeric | | |
min | numeric | | |
sum | numeric | | |
avg | numeric | | |
__ivm_count_max__ | bigint | | |
__ivm_count_min__ | bigint | | |
__ivm_count_sum__ | bigint | | |
__ivm_count_avg__ | bigint | | |
__ivm_sum_avg__ | numeric | | |
__ivm_count__ | bigint | | |

SELECT * FROM group_imv ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)

ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2;
ALTER MATERIALIZED VIEW
List of relations
Schema | Name | Type | Owner
--------+------------+-------------------+----------
public | group_imv2 | materialized view | postgres
public | table_x | table | postgres
(2 rows)

Materialized view "public.group_imv2"
Column | Type | Collation | Nullable | Default
-------------------+---------+-----------+----------+---------
part_key | text | | |
count | bigint | | |
max | numeric | | |
min | numeric | | |
sum | numeric | | |
avg | numeric | | |
__ivm_count_max__ | bigint | | |
__ivm_count_min__ | bigint | | |
__ivm_count_sum__ | bigint | | |
__ivm_count_avg__ | bigint | | |
__ivm_sum_avg__ | numeric | | |
__ivm_count__ | bigint | | |

SET client_min_messages = debug5;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG:
CommitTransaction(1) name: unnamed; blockState: STARTED; state:
INPROGRESS, xid/subid/cid: 0/1/0
SET
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:
StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
xid/subid/cid: 0/1/0
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation
"public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation
"public.group_imv" does not exist
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR: could not
open relation with OID 0
RESET client_min_messages;
psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG:
StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
xid/subid/cid: 0/1/0
RESET
SELECT * FROM group_imv2 ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
(3 rows)

ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv;
ALTER MATERIALIZED VIEW
INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
'gruop_d');
INSERT 0 1
SELECT * FROM group_imv ORDER BY part_key;
part_key | count | max | min | sum | avg
----------+-------+-------+-------+-----------+---------------------
group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
gruop_d | 1 | 81.43 | 81.43 | 81.43 | 81.4300000000000000
(4 rows)

[ec2-user(at)ip-10-0-1-10 ivm]$
```

This may be because IVM internal information is not modified when the view
name is renamed.

2018年12月27日(木) 21:57 Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>:

> Hi,
>
> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.
> IVM is a technique to maintain materialized views which computes and
> applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.
>
> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> [1].
> Our implementation uses row OIDs to compute deltas for materialized
> views.
> The basic idea is that if we have information about which rows in base
> tables
> are contributing to generate a certain row in a matview then we can
> identify
> the affected rows when a base table is updated. This is based on an idea of
> Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> approach[3].
>
> In our implementation, the mapping of the row OIDs of the materialized view
> and the base tables are stored in "OID map". When a base relation is
> modified,
> AFTER trigger is executed and the delta is recorded in delta tables using
> the transition table feature. The accual udpate of the matview is triggerd
> by REFRESH command with INCREMENTALLY option.
>
> However, we realize problems of our implementation. First, WITH OIDS will
> be removed since PG12, so OIDs are no longer available. Besides this, it
> would
> be hard to implement this since it needs many changes of executor nodes to
> collect base tables's OIDs during execuing a query. Also, the cost of
> maintaining
> OID map would be high.
>
> For these reasons, we started to think to implement IVM without relying on
> OIDs
> and made a bit more surveys.
>
> We also looked at Kevin Grittner's discussion [4] on incremental matview
> maintenance. In this discussion, Kevin proposed to use counting algorithm
> [5]
> to handle projection views (using DISTNICT) properly. This algorithm need
> an
> additional system column, count_t, in materialized views and delta tables
> of
> base tables.
>
> However, the discussion about IVM is now stoped, so we would like to
> restart and
> progress this.
>
>
> Through our PoC inplementation and surveys, I think we need to think at
> least
> the followings for implementing IVM.
>
> 1. How to extract changes on base tables
>
> I think there would be at least two approaches for it.
>
> - Using transition table in AFTER triggers
> - Extracting changes from WAL using logical decoding
>
> In our PoC implementation, we used AFTER trigger and transition tables,
> but using
> logical decoding might be better from the point of performance of base
> table
> modification.
>
> If we can represent a change of UPDATE on a base table as query-like
> rather than
> OLD and NEW, it may be possible to update the materialized view directly
> instead
> of performing delete & insert.
>
>
> 2. How to compute the delta to be applied to materialized views
>
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations". For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.
>
> There could be several operations for view definition: selection,
> projection,
> join, aggregation, union, difference, intersection, etc. If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple
> view definition, and then support more complex views.
>
>
> 3. How to identify rows to be modifed in materialized views
>
> When applying the delta to the materialized view, we have to identify
> which row
> in the matview is corresponding to a row in the delta. A naive method is
> matching
> by using all columns in a tuple, but clearly this is unefficient. If
> thematerialized
> view has unique index, we can use this. Maybe, we have to force
> materialized views
> to have all primary key colums in their base tables. In our PoC
> implementation, we
> used OID to identify rows, but this will be no longer available as said
> above.
>
>
> 4. When to maintain materialized views
>
> There are two candidates of the timing of maintenance, immediate (eager)
> or deferred.
>
> In eager maintenance, the materialized view is updated in the same
> transaction
> where the base table is updated. In deferred maintenance, this is done
> after the
> transaction is commited, for example, when view is accessed, as a response
> to user
> request, etc.
>
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.
>
> In the eager maintenance approache, we have to consider a race condition
> where two
> different transactions change base tables simultaneously as discussed in
> [4].
>
>
> [1]
> https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> [2]
> https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> (Japanese only)
> [3] https://dl.acm.org/citation.cfm?id=2750546
> [4]
> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> [5] https://dl.acm.org/citation.cfm?id=170066
>
> Regards,
> --
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message nuko yokohama 2019-12-04 12:43:09 Re: Implementing Incremental View Maintenance
Previous Message Amit Kapila 2019-12-04 12:08:08 Re: [HACKERS] Block level parallel vacuum