Re: Implementing Incremental View Maintenance

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tsunakawa(dot)takay(at)fujitsu(dot)com
Cc: ishii(at)sraoss(dot)co(dot)jp, nagata(at)sraoss(dot)co(dot)jp, michael(at)paquier(dot)xyz, amitlangote09(at)gmail(dot)com, hoshiai(at)sraoss(dot)co(dot)jp, alvherre(at)2ndquadrant(dot)com, thomas(dot)munro(at)gmail(dot)com, kgrittn(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implementing Incremental View Maintenance
Date: 2019-12-24 08:09:09
Message-ID: 20191224.170909.2237007779818875622.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose. The benefit of ON STATEMENT is that the user does not have to create and maintain the materialized view log. But I'm not sure if and when the benefit defeats the performance overhead on DML statements. It's not disclosed whether ON STATEMENT uses triggers.

AFAIK benefit of ON STATEMENT is the transaction can see the result of
update to the base tables. With ON COMMIT, the transaction does not
see the result until the transaction commits.

> Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT refresh mode?
>
> * What use case does the feature fit?
> If the trigger makes it difficult to use in the data ware house, does the feature target OLTP?

Well, I can see use cases of IVM in both DWH and OLTP.

For example, a user create a DWH-like data using materialized
view. After the initial data is loaded, the data is seldom updated.
However one day a user wants to change just one row to see how it
affects to the whole DWH data. IVM will help here because it could be
done in shorter time than loading whole data.

Another use case is a ticket selling system. The system shows how many
tickets remain in a real time manner. For this purpose it needs to
count the number of tickets already sold from a log table. By using
IVM, it could be accomplished in simple and effective way.

> What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product table, where the data volume and frequency of data loading is ...)?
> In other words, this is about what kind of example we can recommend as a typical use case of this feature.

Here are some use cases suitable for IVM I can think of:

- Users are creating home made triggers to get data from tables. Since
IVM could eliminates some of those triggers, we could expect less
maintenance cost and bugs accidentally brought in when the triggers
were created.

- Any use case in which the cost of refreshing whole result table
(materialized view) is so expensive that it justifies the cost of
updating of base tables. See the example of use cases above.

> * Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of ON STATEMENT (i.g. DML overhead)?

Outweights to what?

> * Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is not a requirement but simply the result of implementation?

I think it's important to refresh the materialized view after every
statement and the benefit for users are apparent because it brings
real time data refresh to users.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-12-24 08:15:40 Re: Columns correlation and adaptive query optimization
Previous Message Amit Langote 2019-12-24 07:55:14 Re: unsupportable composite type partition keys