RE: Implementing Incremental View Maintenance

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Tatsuo Ishii' <ishii(at)sraoss(dot)co(dot)jp>
Cc: "nagata(at)sraoss(dot)co(dot)jp" <nagata(at)sraoss(dot)co(dot)jp>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "amitlangote09(at)gmail(dot)com" <amitlangote09(at)gmail(dot)com>, "hoshiai(at)sraoss(dot)co(dot)jp" <hoshiai(at)sraoss(dot)co(dot)jp>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "kgrittn(at)gmail(dot)com" <kgrittn(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Implementing Incremental View Maintenance
Date: 2019-12-24 06:52:35
Message-ID: OSAPR01MB50739CFBFEE8198684171D8CFE290@OSAPR01MB5073.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
> First of all, we do not think that current approach is the final
> one. Instead we want to implement IVM feature one by one: i.e. we
> start with "immediate update" approach, because it's simple and easier
> to implement. Then we will add "deferred update" mode later on.

I agree about incremental feature introduction. What I'm simply asking is the concrete use case (workload and data), so that I can convince myself to believe that this feature is useful and focus on reviewing and testing (because the patch seems big and difficult...)

> In fact Oracle has both "immediate update" and "deferred update" mode
> of IVM (actually there are more "mode" with their implementation).
>
> I recommend you to look into Oracle's materialized view feature
> closely. For fair evaluation, probably we should compare the IVM patch
> with Oracle's "immediate update" (they call it "on statement") mode.
>
> Probably deferred IVM mode is more suitable for DWH. However as I said
> earlier, we hope to implement the immediate mode first then add the
> deferred mode. Let's start with simple one then add more features.

Yes, I know Oracle's ON STATEMENT refresh mode (I attached references at the end for others.)

Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose. The benefit of ON STATEMENT is that the user does not have to create and maintain the materialized view log. But I'm not sure if and when the benefit defeats the performance overhead on DML statements. It's not disclosed whether ON STATEMENT uses triggers.

Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT refresh mode?

* What use case does the feature fit?
If the trigger makes it difficult to use in the data ware house, does the feature target OLTP?
What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product table, where the data volume and frequency of data loading is ...)?
In other words, this is about what kind of example we can recommend as a typical use case of this feature.

* Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML overhead)?

* Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is not a requirement but simply the result of implementation?

[References]
https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/refreshing-materialized-views.html#GUID-C40C225A-8328-44D5-AE90-9078C2C773EA
--------------------------------------------------
7.1.5 About ON COMMIT Refresh for Materialized Views

A materialized view can be refreshed automatically using the ON COMMIT method. Therefore, whenever a transaction commits which has updated the tables on which a materialized view is defined, those changes are automatically reflected in the materialized view. The advantage of using this approach is you never have to remember to refresh the materialized view. The only disadvantage is the time required to complete the commit will be slightly longer because of the extra processing involved. However, in a data warehouse, this should not be an issue because there is unlikely to be concurrent processes trying to update the same table.

7.1.6 About ON STATEMENT Refresh for Materialized Views

A materialized view that uses the ON STATEMENT refresh mode is automatically refreshed every time a DML operation is performed on any of the materialized view’s base tables.

With the ON STATEMENT refresh mode, any changes to the base tables are immediately reflected in the materialized view. There is no need to commit the transaction or maintain materialized view logs on the base tables. If the DML statements are subsequently rolled back, then the corresponding changes made to the materialized view are also rolled back.

The advantage of the ON STATEMENT refresh mode is that the materialized view is always synchronized with the data in the base tables, without the overhead of maintaining materialized view logs. However, this mode may increase the time taken to perform a DML operation because the materialized view is being refreshed as part of the DML operation.
--------------------------------------------------

https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993
--------------------------------------------------
ON STATEMENT refresh mode for materialized views
The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table, without the need to commit the transaction. This mode does not require you to maintain materialized view logs on the base tables.
--------------------------------------------------

http://www.oracle.com/us/solutions/sap/matview-refresh-db12c-2877319.pdf
--------------------------------------------------
We have introduced a new Materialized View (MV) refresh mechanism called ON STATEMENT refresh. With the ON STATEMENT refresh method, an MV is automatically refreshed whenever DML happens on a base table of the MV. Therefore, whenever a DML happens on any table on which a materialized view is defined, the change is automatically reflected in the materialized view. The advantage of using this approach is that the user no long needs to create a materialized view log on each of the base table in order to do fast refresh. The refresh can then avoid the overhead introduced by MV logging but still keep the materialized view refreshed all the time.

Specify ON STATEMENT to indicate that a fast refresh is to occur whenever DML happens on a base table of the materialized view. This is to say, ON STATEMENT materialized view is always in sync with base table changes even before the transaction commits. If a transaction that made changes to the base tables rolls back, the corresponding changes in on statement MV are rolled back as well. This clause may increase the time taken to complete a DML, because the database performs the refresh operation as part of the DML execution. However, unlike other types of fast refreshable materialized views, ON STATEMENT MV refresh no longer requires MV log on the base tables or any extra work on MV logs in order to do fast refresh.
--------------------------------------------------

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2019-12-24 07:07:35 RE: Implementing Incremental View Maintenance
Previous Message Masahiko Sawada 2019-12-24 06:46:22 Re: [HACKERS] Block level parallel vacuum