Re: Implementing Incremental View Maintenance

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-02-19 00:11:35
Message-ID: 20200219091135.79df6f6c63218fbff4b32d5c@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Feb 2020 22:03:47 +0900
nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com> wrote:

> Hi.
>
> SELECT statements with a TABLESAMPLE clause should be rejected.
>
> Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements
> with the TABLESAMPLE clause.
> However, the result of this SELECT statement is undefined and should be
> rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
> (similar to handling non-immutable functions)

Thanks! We totally agree with you. We are now working on improvement of
query checks at creating IMMV. TABLESAMPLE will also be checked in this.

Regards,
Yugo Nagata

> Regard.
>
> 2020年2月8日(土) 11:15 nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com>:
>
> > Hi.
> >
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user(at)ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> > id | data
> > ----+--------------------
> > 1 | 0.950724735058774
> > 2 | 0.0222670808201144
> > 3 | 0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> > id | data
> > ----+--------------------
> > 1 | 0.991717347778337
> > 2 | 0.0528458947672874
> > 3 | 0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> > name | id | data
> > ---------+----+--------------------
> > table_y | 2 | 0.0528458947672874
> > table_x | 2 | 0.0222670808201144
> > table_y | 3 | 0.965044982911163
> > table_x | 1 | 0.950724735058774
> > table_x | 3 | 0.391258547114841
> > table_y | 1 | 0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user(at)ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> > id | data
> > ----+--------------------
> > 1 | 0.950724735058774
> > 2 | 0.0222670808201144
> > 3 | 0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> > id | data
> > ----+--------------------
> > 1 | 0.991717347778337
> > 2 | 0.0528458947672874
> > 3 | 0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> > name | id | data
> > ---------+----+--------------------
> > table_y | 2 | 0.0528458947672874
> > table_x | 2 | 0.0222670808201144
> > table_y | 3 | 0.965044982911163
> > table_x | 1 | 0.950724735058774
> > table_x | 3 | 0.391258547114841
> > table_y | 1 | 0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> >
> > 2018年12月27日(木) 21:57 Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>:
> >
> >> Hi,
> >>
> >> I would like to implement Incremental View Maintenance (IVM) on
> >> PostgreSQL.
> >> IVM is a technique to maintain materialized views which computes and
> >> applies
> >> only the incremental changes to the materialized views rather than
> >> recomputate the contents as the current REFRESH command does.
> >>
> >> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> >> [1].
> >> Our implementation uses row OIDs to compute deltas for materialized
> >> views.
> >> The basic idea is that if we have information about which rows in base
> >> tables
> >> are contributing to generate a certain row in a matview then we can
> >> identify
> >> the affected rows when a base table is updated. This is based on an idea
> >> of
> >> Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> >> approach[3].
> >>
> >> In our implementation, the mapping of the row OIDs of the materialized
> >> view
> >> and the base tables are stored in "OID map". When a base relation is
> >> modified,
> >> AFTER trigger is executed and the delta is recorded in delta tables using
> >> the transition table feature. The accual udpate of the matview is triggerd
> >> by REFRESH command with INCREMENTALLY option.
> >>
> >> However, we realize problems of our implementation. First, WITH OIDS will
> >> be removed since PG12, so OIDs are no longer available. Besides this, it
> >> would
> >> be hard to implement this since it needs many changes of executor nodes to
> >> collect base tables's OIDs during execuing a query. Also, the cost of
> >> maintaining
> >> OID map would be high.
> >>
> >> For these reasons, we started to think to implement IVM without relying
> >> on OIDs
> >> and made a bit more surveys.
> >>
> >> We also looked at Kevin Grittner's discussion [4] on incremental matview
> >> maintenance. In this discussion, Kevin proposed to use counting
> >> algorithm [5]
> >> to handle projection views (using DISTNICT) properly. This algorithm need
> >> an
> >> additional system column, count_t, in materialized views and delta tables
> >> of
> >> base tables.
> >>
> >> However, the discussion about IVM is now stoped, so we would like to
> >> restart and
> >> progress this.
> >>
> >>
> >> Through our PoC inplementation and surveys, I think we need to think at
> >> least
> >> the followings for implementing IVM.
> >>
> >> 1. How to extract changes on base tables
> >>
> >> I think there would be at least two approaches for it.
> >>
> >> - Using transition table in AFTER triggers
> >> - Extracting changes from WAL using logical decoding
> >>
> >> In our PoC implementation, we used AFTER trigger and transition tables,
> >> but using
> >> logical decoding might be better from the point of performance of base
> >> table
> >> modification.
> >>
> >> If we can represent a change of UPDATE on a base table as query-like
> >> rather than
> >> OLD and NEW, it may be possible to update the materialized view directly
> >> instead
> >> of performing delete & insert.
> >>
> >>
> >> 2. How to compute the delta to be applied to materialized views
> >>
> >> Essentially, IVM is based on relational algebra. Theorically, changes on
> >> base
> >> tables are represented as deltas on this, like "R <- R + dR", and the
> >> delta on
> >> the materialized view is computed using base table deltas based on "change
> >> propagation equations". For implementation, we have to derive the
> >> equation from
> >> the view definition query (Query tree, or Plan tree?) and describe this
> >> as SQL
> >> query to compulte delta to be applied to the materialized view.
> >>
> >> There could be several operations for view definition: selection,
> >> projection,
> >> join, aggregation, union, difference, intersection, etc. If we can
> >> prepare a
> >> module for each operation, it makes IVM extensable, so we can start a
> >> simple
> >> view definition, and then support more complex views.
> >>
> >>
> >> 3. How to identify rows to be modifed in materialized views
> >>
> >> When applying the delta to the materialized view, we have to identify
> >> which row
> >> in the matview is corresponding to a row in the delta. A naive method is
> >> matching
> >> by using all columns in a tuple, but clearly this is unefficient. If
> >> thematerialized
> >> view has unique index, we can use this. Maybe, we have to force
> >> materialized views
> >> to have all primary key colums in their base tables. In our PoC
> >> implementation, we
> >> used OID to identify rows, but this will be no longer available as said
> >> above.
> >>
> >>
> >> 4. When to maintain materialized views
> >>
> >> There are two candidates of the timing of maintenance, immediate (eager)
> >> or deferred.
> >>
> >> In eager maintenance, the materialized view is updated in the same
> >> transaction
> >> where the base table is updated. In deferred maintenance, this is done
> >> after the
> >> transaction is commited, for example, when view is accessed, as a
> >> response to user
> >> request, etc.
> >>
> >> In the previous discussion[4], it is planned to start from "eager"
> >> approach. In our PoC
> >> implementaion, we used the other aproach, that is, using REFRESH command
> >> to perform IVM.
> >> I am not sure which is better as a start point, but I begin to think that
> >> the eager
> >> approach may be more simple since we don't have to maintain base table
> >> changes in other
> >> past transactions.
> >>
> >> In the eager maintenance approache, we have to consider a race condition
> >> where two
> >> different transactions change base tables simultaneously as discussed in
> >> [4].
> >>
> >>
> >> [1]
> >> https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> >> [2]
> >> https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> >> (Japanese only)
> >> [3] https://dl.acm.org/citation.cfm?id=2750546
> >> [4]
> >> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> >> [5] https://dl.acm.org/citation.cfm?id=170066
> >>
> >> Regards,
> >> --
> >> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
> >>
> >>

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2020-02-19 00:16:09 Re: Fastpath while arranging the changes in LSN order in logical decoding
Previous Message Melanie Plageman 2020-02-18 23:31:22 Re: Memory-Bounded Hash Aggregation