RE: Implementing Incremental View Maintenance

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Tatsuo Ishii' <ishii(at)sraoss(dot)co(dot)jp>
Cc: "nagata(at)sraoss(dot)co(dot)jp" <nagata(at)sraoss(dot)co(dot)jp>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "amitlangote09(at)gmail(dot)com" <amitlangote09(at)gmail(dot)com>, "hoshiai(at)sraoss(dot)co(dot)jp" <hoshiai(at)sraoss(dot)co(dot)jp>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "kgrittn(at)gmail(dot)com" <kgrittn(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Implementing Incremental View Maintenance
Date: 2019-12-23 07:43:23
Message-ID: OSAPR01MB507340694288FBA8CE0870BAFE2E0@OSAPR01MB5073.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
> the target data? In the discussion, someone referred to master data with low
> update frequency, because the proposed IVM implementation adds triggers on
> source tables, which limits the applicability to update-heavy tables.
>
> But if you want to get always up-to-data you need to pay the cost for
> REFRESH MATERIALIZED VIEW. IVM gives a choice here.

Thank you, that clarified to some extent. What kind of data do you think of as an example?

Materialized view reminds me of the use in a data warehouse. Oracle handles the top in its Database Data Warehousing Guide, and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly SQL Data Warehouse). AWS also has previewed Redshift's materialized view feature in re:Invent 2019. Are you targeting the data warehouse (analytics) workload?

IIUC, to put (over) simply, the data warehouse has two kind of tables:

* Facts (transaction data): e.g. sales, user activity
Large amount. INSERT only on a regular basis (ETL/ELT) or continuously (streaming)

* Dimensions (master/reference data): e.g. product, customer, time, country
Small amount. Infrequently INSERTed or UPDATEd.

The proposed trigger-based approach does not seem to be suitable for the facts, because the trigger overhead imposed on data loading may offset or exceed the time saved by incrementally refreshing the materialized views.

Then, does the proposed feature fit the dimension tables? If the materialized view is only based on the dimension data, then the full REFRESH of the materialized view wouldn't take so long. The typical materialized view should join the fact and dimension tables. Then, the fact table will have to have the triggers, causing the data loading slowdown.

I'm saying this because I'm concerned about the trigger based overhead. As you know, Oracle uses materialized view logs to save changes and incrementally apply them later to the materialized views (REFRESH ON STATEMENT materialized views doesn't require the materialized view log, so it might use triggers.) Does any commercial grade database implement materialized view using triggers? I couldn't find relevant information regarding Azure Synapse and Redshift.

If our only handy option is a trigger, can we minimize the overhead by doing the view maintenance at transaction commit?

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2019-12-23 08:08:53 RE: Implementing Incremental View Maintenance
Previous Message Mahendra Singh 2019-12-23 07:24:44 Re: [HACKERS] Block level parallel vacuum