Re: Implementing Incremental View Maintenance

From: nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-01-18 05:39:35
Message-ID: CAF3Gu1ZPMxBeGnnbt4kSUd=6H9JKYJ=gvFj5TqRGabUk8eCrxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.
I understand.
Even if the function name is min, there is a possibility that it is not an
aggregation operation for finding the minimum value, so it is restricted.
I understood aggregation of user-defined types is a constraint.

Also, I agree with the error message improvements.

2020年1月17日(金) 17:12 Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>:

> On Thu, 16 Jan 2020 12:59:11 +0900
> nuko yokohama <nuko(dot)yokohama(at)gmail(dot)com> wrote:
>
> > Aggregate operation of user-defined type cannot be specified
> > (commit e150d964df7e3aeb768e4bae35d15764f8abd284)
> >
> > A SELECT statement using the MIN() and MAX() functions can be executed
> on a
> > user-defined type column that implements the aggregate functions MIN ()
> and
> > MAX ().
> > However, if the same SELECT statement is specified in the AS clause of
> > CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.
> >
> > ```
> > SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> > data_min | data_max
> > ----------+----------
> > 1/3 | 2/3
> > (1 row)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> > data_min FROM foo;
> > psql:extension-agg.sql:14: ERROR: aggregate function min is not
> supported
> > CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> > data_max FROM foo;
> > psql:extension-agg.sql:15: ERROR: aggregate function max is not
> supported
> > ```
> >
> > Does query including user-defined type aggregate operation not supported
> by
> > INCREMENTAL MATERIALIZED VIEW?
>
> The current implementation supports only built-in aggregate functions, so
> user-defined aggregates are not supported, although it is allowed before.
> This is because we can not know how user-defined aggregates behave and if
> it can work safely with IVM. Min/Max on your fraction type may work well,
> but it is possible that some user-defined aggregate functions named min
> or max behave in totally different way than we expected.
>
> In future, maybe it is possible support user-defined aggregates are
> supported
> by extending pg_aggregate and adding support functions for IVM, but there
> is
> not still a concrete plan for now.
>
> BTW, the following error message doesn't look good because built-in min is
> supported, so I will improve it.
>
> ERROR: aggregate function min is not supported
>
> Regards,
> Yugo Nagata
>
> >
> > An execution example is shown below.
> >
> > ```
> > [ec2-user(at)ip-10-0-1-10 ivm]$ cat extension-agg.sql
> > --
> > -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
> > --
> > DROP EXTENSION IF EXISTS pg_fraction CASCADE;
> > DROP TABLE IF EXISTS foo CASCADE;
> >
> > CREATE EXTENSION IF NOT EXISTS pg_fraction;
> > \dx
> > \dT+ fraction
> >
> > CREATE TABLE foo (id int, data fraction);
> > INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
> > SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> > CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> > data_min FROM foo;
> > CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> > data_max FROM foo;
> >
> > SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
> > CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min,
> > MAX(id) id_max FROM foo;
> > ```
> >
> > Best regards.
> >
> > 2018年12月27日(木) 21:57 Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>:
> >
> > > Hi,
> > >
> > > I would like to implement Incremental View Maintenance (IVM) on
> > > PostgreSQL.
> > > IVM is a technique to maintain materialized views which computes and
> > > applies
> > > only the incremental changes to the materialized views rather than
> > > recomputate the contents as the current REFRESH command does.
> > >
> > > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > > [1].
> > > Our implementation uses row OIDs to compute deltas for materialized
> > > views.
> > > The basic idea is that if we have information about which rows in base
> > > tables
> > > are contributing to generate a certain row in a matview then we can
> > > identify
> > > the affected rows when a base table is updated. This is based on an
> idea of
> > > Dr. Masunaga [2] who is a member of our group and inspired from
> ID-based
> > > approach[3].
> > >
> > > In our implementation, the mapping of the row OIDs of the materialized
> view
> > > and the base tables are stored in "OID map". When a base relation is
> > > modified,
> > > AFTER trigger is executed and the delta is recorded in delta tables
> using
> > > the transition table feature. The accual udpate of the matview is
> triggerd
> > > by REFRESH command with INCREMENTALLY option.
> > >
> > > However, we realize problems of our implementation. First, WITH OIDS
> will
> > > be removed since PG12, so OIDs are no longer available. Besides this,
> it
> > > would
> > > be hard to implement this since it needs many changes of executor
> nodes to
> > > collect base tables's OIDs during execuing a query. Also, the cost of
> > > maintaining
> > > OID map would be high.
> > >
> > > For these reasons, we started to think to implement IVM without
> relying on
> > > OIDs
> > > and made a bit more surveys.
> > >
> > > We also looked at Kevin Grittner's discussion [4] on incremental
> matview
> > > maintenance. In this discussion, Kevin proposed to use counting
> algorithm
> > > [5]
> > > to handle projection views (using DISTNICT) properly. This algorithm
> need
> > > an
> > > additional system column, count_t, in materialized views and delta
> tables
> > > of
> > > base tables.
> > >
> > > However, the discussion about IVM is now stoped, so we would like to
> > > restart and
> > > progress this.
> > >
> > >
> > > Through our PoC inplementation and surveys, I think we need to think at
> > > least
> > > the followings for implementing IVM.
> > >
> > > 1. How to extract changes on base tables
> > >
> > > I think there would be at least two approaches for it.
> > >
> > > - Using transition table in AFTER triggers
> > > - Extracting changes from WAL using logical decoding
> > >
> > > In our PoC implementation, we used AFTER trigger and transition tables,
> > > but using
> > > logical decoding might be better from the point of performance of base
> > > table
> > > modification.
> > >
> > > If we can represent a change of UPDATE on a base table as query-like
> > > rather than
> > > OLD and NEW, it may be possible to update the materialized view
> directly
> > > instead
> > > of performing delete & insert.
> > >
> > >
> > > 2. How to compute the delta to be applied to materialized views
> > >
> > > Essentially, IVM is based on relational algebra. Theorically, changes
> on
> > > base
> > > tables are represented as deltas on this, like "R <- R + dR", and the
> > > delta on
> > > the materialized view is computed using base table deltas based on
> "change
> > > propagation equations". For implementation, we have to derive the
> > > equation from
> > > the view definition query (Query tree, or Plan tree?) and describe
> this as
> > > SQL
> > > query to compulte delta to be applied to the materialized view.
> > >
> > > There could be several operations for view definition: selection,
> > > projection,
> > > join, aggregation, union, difference, intersection, etc. If we can
> > > prepare a
> > > module for each operation, it makes IVM extensable, so we can start a
> > > simple
> > > view definition, and then support more complex views.
> > >
> > >
> > > 3. How to identify rows to be modifed in materialized views
> > >
> > > When applying the delta to the materialized view, we have to identify
> > > which row
> > > in the matview is corresponding to a row in the delta. A naive method
> is
> > > matching
> > > by using all columns in a tuple, but clearly this is unefficient. If
> > > thematerialized
> > > view has unique index, we can use this. Maybe, we have to force
> > > materialized views
> > > to have all primary key colums in their base tables. In our PoC
> > > implementation, we
> > > used OID to identify rows, but this will be no longer available as said
> > > above.
> > >
> > >
> > > 4. When to maintain materialized views
> > >
> > > There are two candidates of the timing of maintenance, immediate
> (eager)
> > > or deferred.
> > >
> > > In eager maintenance, the materialized view is updated in the same
> > > transaction
> > > where the base table is updated. In deferred maintenance, this is done
> > > after the
> > > transaction is commited, for example, when view is accessed, as a
> response
> > > to user
> > > request, etc.
> > >
> > > In the previous discussion[4], it is planned to start from "eager"
> > > approach. In our PoC
> > > implementaion, we used the other aproach, that is, using REFRESH
> command
> > > to perform IVM.
> > > I am not sure which is better as a start point, but I begin to think
> that
> > > the eager
> > > approach may be more simple since we don't have to maintain base table
> > > changes in other
> > > past transactions.
> > >
> > > In the eager maintenance approache, we have to consider a race
> condition
> > > where two
> > > different transactions change base tables simultaneously as discussed
> in
> > > [4].
> > >
> > >
> > > [1]
> > >
> https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > > [2]
> > >
> https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > > (Japanese only)
> > > [3] https://dl.acm.org/citation.cfm?id=2750546
> > > [4]
> > >
> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > > [5] https://dl.acm.org/citation.cfm?id=170066
> > >
> > > Regards,
> > > --
> > > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
> > >
> > >
>
>
> --
> Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-01-18 07:04:14 Re: pg13 PGDLLIMPORT list
Previous Message Amit Kapila 2020-01-18 05:29:49 Re: Reorderbuffer crash during recovery