Re: Implementing Incremental View Maintenance

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Implementing Incremental View Maintenance
Date: 2019-12-05 01:19:51
Message-ID: 20191205101951.06928902c27004b6457b6b17@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 4 Dec 2019 21:18:02 +0900
nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com> wrote:

> Hi.
>
> I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO".
> If a view created with "CREATE INCREMENT MATERIALIZED VIEW" is renamed,
> subsequent INSERT operations to the base table will fail.
>
> Error message.
> ```
> ERROR: could not open relation with OID 0

Thank you for your pointing out this issue! This error occurs
because the view's OID is retrieved using the view name.
Considering that the name can be changed, this is obviously
wrong. We'll fix it.

Regards,
Yugo Nagata

> ```
>
> Execution log.
> ```
> [ec2-user(at)ip-10-0-1-10 ivm]$ psql -U postgres test -e -f
> ~/test/ivm/alter_rename_bug.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE: drop cascades
> to materialized view group_imv
> DROP TABLE
> CREATE TABLE table_x AS
> SELECT generate_series(1, 10000) AS id,
> ROUND(random()::numeric * 100, 2) AS data,
> CASE (random() * 5)::integer
> WHEN 4 THEN 'group-a'
> WHEN 3 THEN 'group-b'
> ELSE 'group-c'
> END AS part_key
> ;
> SELECT 10000
> Table "public.table_x"
> Column | Type | Collation | Nullable | Default
> ----------+---------+-----------+----------+---------
> id | integer | | |
> data | numeric | | |
> part_key | text | | |
>
> DROP MATERIALIZED VIEW IF EXISTS group_imv;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE: materialized
> view "group_imv" does not exist, skipping
> DROP MATERIALIZED VIEW
> CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS
> SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data)
> FROM table_x
> GROUP BY part_key;
> SELECT 3
> List of relations
> Schema | Name | Type | Owner
> --------+-----------+-------------------+----------
> public | group_imv | materialized view | postgres
> public | table_x | table | postgres
> (2 rows)
>
> Materialized view "public.group_imv"
> Column | Type | Collation | Nullable | Default
> -------------------+---------+-----------+----------+---------
> part_key | text | | |
> count | bigint | | |
> max | numeric | | |
> min | numeric | | |
> sum | numeric | | |
> avg | numeric | | |
> __ivm_count_max__ | bigint | | |
> __ivm_count_min__ | bigint | | |
> __ivm_count_sum__ | bigint | | |
> __ivm_count_avg__ | bigint | | |
> __ivm_sum_avg__ | numeric | | |
> __ivm_count__ | bigint | | |
>
> SELECT * FROM group_imv ORDER BY part_key;
> part_key | count | max | min | sum | avg
> ----------+-------+-------+------+-----------+---------------------
> group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
> group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
> group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
> (3 rows)
>
> ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2;
> ALTER MATERIALIZED VIEW
> List of relations
> Schema | Name | Type | Owner
> --------+------------+-------------------+----------
> public | group_imv2 | materialized view | postgres
> public | table_x | table | postgres
> (2 rows)
>
> Materialized view "public.group_imv2"
> Column | Type | Collation | Nullable | Default
> -------------------+---------+-----------+----------+---------
> part_key | text | | |
> count | bigint | | |
> max | numeric | | |
> min | numeric | | |
> sum | numeric | | |
> avg | numeric | | |
> __ivm_count_max__ | bigint | | |
> __ivm_count_min__ | bigint | | |
> __ivm_count_sum__ | bigint | | |
> __ivm_count_avg__ | bigint | | |
> __ivm_sum_avg__ | numeric | | |
> __ivm_count__ | bigint | | |
>
> SET client_min_messages = debug5;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG:
> CommitTransaction(1) name: unnamed; blockState: STARTED; state:
> INPROGRESS, xid/subid/cid: 0/1/0
> SET
> INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
> 'gruop_d');
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:
> StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
> xid/subid/cid: 0/1/0
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation
> "public.group_imv" does not exist
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG: relation
> "public.group_imv" does not exist
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR: could not
> open relation with OID 0
> RESET client_min_messages;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG:
> StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
> xid/subid/cid: 0/1/0
> RESET
> SELECT * FROM group_imv2 ORDER BY part_key;
> part_key | count | max | min | sum | avg
> ----------+-------+-------+------+-----------+---------------------
> group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
> group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
> group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
> (3 rows)
>
> ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv;
> ALTER MATERIALIZED VIEW
> INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
> 'gruop_d');
> INSERT 0 1
> SELECT * FROM group_imv ORDER BY part_key;
> part_key | count | max | min | sum | avg
> ----------+-------+-------+-------+-----------+---------------------
> group-a | 1966 | 99.85 | 0.05 | 98634.93 | 50.1703611393692777
> group-b | 2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
> group-c | 6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
> gruop_d | 1 | 81.43 | 81.43 | 81.43 | 81.4300000000000000
> (4 rows)
>
> [ec2-user(at)ip-10-0-1-10 ivm]$
> ```
>
> This may be because IVM internal information is not modified when the view
> name is renamed.
>
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>:
>
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance. In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> > - Using transition table in AFTER triggers
> > - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations". For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join, aggregation, union, difference, intersection, etc. If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta. A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables. In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
> >
> >

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2019-12-05 01:24:00 Re: Memory-Bounded Hash Aggregation
Previous Message Peter Geoghegan 2019-12-05 00:54:58 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.