Re: Implementing Incremental View Maintenance

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, michael(at)paquier(dot)xyz, thomas(dot)munro(at)gmail(dot)com, takuma(dot)hoshiai(at)gmail(dot)com, amitlangote09(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, kgrittn(at)gmail(dot)com
Subject: Re: Implementing Incremental View Maintenance
Date: 2020-10-16 10:30:34
Message-ID: 20201016193034.9a4c44c79fc1eca7babe093e@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I have reviewed the past discussions in this thread on IVM implementation
of the proposed patch[1], and summarized it as following . We would appreciate
any comments or suggestions on the patch as regard of them.

* Aggregate support

The current patch supports several built-in aggregates, that is, count, sum,
avg, min, and max. Other built-in aggregates or user-defined aggregates are
not supported.

Aggregates in a materialized view definition is checked if this is supported
using OIDs of aggregate function. For this end, Gen_fmgrtab.pl is changed to
output aggregate function's OIDs to fmgroids.h
(by 0006-Change-Gen_fmgrtab.pl-to-output-aggregate-function-s.patch).
The logic for each aggregate function to update aggregate values in
materialized views is enbedded in a trigger function.

There was another option in the past discussion. That is, we could add one
or more new attribute to pg_aggregate which provides information about if
each aggregate function supports IVM and its logic[2]. If we have a mechanism
to support IVM in pg_aggregate, we may use more general aggregate functions
including user-defined aggregate in materialized views for IVM.

For example, the current pg_aggregate has aggcombinefn attribute for
supporting partial aggregation. Maybe we could use combine functions to
calculate new aggregate values in materialized views when tuples are
inserted into a base table. However, in the context of IVM, we also need
other function used when tuples are deleted from a base table, so we can not
use partial aggregation for IVM in the current implementation.

Maybe, we could support the deletion case by adding a new support function,
say, "inverse combine function". The "inverse combine function" would take
aggregate value in a materialized view and aggregate value calculated from a
delta of view, and produces the new aggregate value which equals the result
after tuples in a base table are deleted.

However, we don't have concrete plan for the new design of pg_aggregate.
In addition, even if make a new support function in pg_aggregate for IVM,
we can't use this in the current IVM code because our code uses SQL via SPI
in order to update a materialized view and we can't call "internal" type
function directly in SQL.

For these reasons, in the current patch, we decided to left supporting
general aggregates to the next version for simplicity, so the current patch
supports only some built-in aggregates and checks if they can be used in IVM
by their OIDs.

* Hidden columns

For supporting aggregates, DISTINCT, and EXISTS, the current implementation
automatically create hidden columns whose name starts with "__ivm_" in
materialized views.

The columns starting with "__ivm_" are hidden, so when "SELECT * FROM ..." is
issued to a materialized view, these are invisible for users. Users can not
use such name as a user column in materialized views with IVM support.

As for how to make internal columns invisible to SELECT *, previously there
have been discussions about doing that using a new flag in pg_attribute[3].
However, the discussion is no longer active. So, we decided to use column
name for checking if this is special or not in our IVM implementation
for now.

* TRUNCATE support

Currently, TRUNCATE on base tables are not supported. When TRUNCATE command
is executed on a base table, it is ignored and nothing occurs on materialized
views.

There are another options as followings:

- Raise an error or warning when a base table is TRUNCATEed.
- Make the view non-scannable (like REFRESH WITH NO DATA)
- Update the view in any way. It would be easy for inner joins
or aggregate views, but there is some difficult with outer joins.

Which is the best way? Should we support TRUNCATE in the first version?
Any suggestions would be greatly appreciated.

[1] https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
[2] https://www.postgresql.org/message-id/20191129173328.e5a0e9f81e369a3769c4fd0c%40sraoss.co.jp
[3] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

Regard,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuki Seino 2020-10-16 10:47:50 Re: Feature improvement for pg_stat_statements
Previous Message Amit Kapila 2020-10-16 10:26:51 Re: Parallel INSERT (INTO ... SELECT ...)