Re: Implementing Incremental View Maintenance

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: huyajun <hu_yajun(at)qq(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "r(dot)takahashi_2(at)fujitsu(dot)com" <r(dot)takahashi_2(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Implementing Incremental View Maintenance
Date: 2022-07-08 10:22:11
Message-ID: 20220708192211.a5a5210b13c2d1911b578a38@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi huyajun,

Thank you for your comments!

On Wed, 29 Jun 2022 17:56:39 +0800
huyajun <hu_yajun(at)qq(dot)com> wrote:

> Hi, Nagata-san
> I read your patch with v27 version and has some new comments,I want to discuss with you.
>
> 1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO
> when record dependence on trigger created by IMV.( related code is in the end of CreateIvmTrigger)
> Otherwise, User can use sql to drop trigger and corrupt IVM, DEPENDENCY_INTERNAL is also semantically more correct
> Crash case like:
> create table t( a int);
> create incremental materialized view s as select * from t;
> drop trigger "IVM_trigger_XXXX”;
> Insert into t values(1);

We use DEPENDENCY_AUTO because we want to delete the triggers when
REFRESH ... WITH NO DATA is performed on the materialized view in order
to disable IVM. Triggers created with DEPENDENCY_INTERNAL cannot be dropped.
Such triggers are re-created when REFRESH ... [WITH DATA] is performed.

We can use DEPENDENCY_INTERNAL if we disable/enable such triggers instead of
dropping/re-creating them, although users also can disable triggers using
ALTER TRIGGER.

> 2. In get_matching_condition_string, Considering NULL values, we can not use simple = operator.
> But how about 'record = record', record_eq treat NULL = NULL
> it should fast than current implementation for only one comparation
> Below is my simple implementation with this, Variables are named arbitrarily..
> I test some cases it’s ok
>
> static char *
> get_matching_condition_string(List *keys)
> {
> StringInfoData match_cond;
> ListCell *lc;
>
> /* If there is no key columns, the condition is always true. */
> if (keys == NIL)
> return "true";
> else
> {
> StringInfoData s1;
> StringInfoData s2;
> initStringInfo(&match_cond);
> initStringInfo(&s1);
> initStringInfo(&s2);
> /* Considering NULL values, we can not use simple = operator. */
> appendStringInfo(&s1, "ROW(");
> appendStringInfo(&s2, "ROW(");
> foreach (lc, keys)
> {
> Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc);
> char *resname = NameStr(attr->attname);
> char *mv_resname = quote_qualified_identifier("mv", resname);
> char *diff_resname = quote_qualified_identifier("diff", resname);
>
> appendStringInfo(&s1, "%s", mv_resname);
> appendStringInfo(&s2, "%s", diff_resname);
>
> if (lnext(lc))
> {
> appendStringInfo(&s1, ", ");
> appendStringInfo(&s2, ", ");
> }
> }
> appendStringInfo(&s1, ")::record");
> appendStringInfo(&s2, ")::record");
> appendStringInfo(&match_cond, "%s operator(pg_catalog.=) %s", s1.data, s2.data);
> return match_cond.data;
> }
> }

As you say, we don't have to use IS NULL if we use ROW(...)::record, but we
cannot use an index in this case and it makes IVM ineffecient. As showed
bellow (#5), an index works even when we use simple = operations together
with together "IS NULL" operations.

> 3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change.

> 4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is
> for concurrent updates to the IVM correctly, But how about to Lock it when actually
> need to maintain MV which in IVM_immediate_maintenance
> In this way you don't have to lock multiple times.

Yes, as you say, we don't have to lock the view multiple times.
I'll investigate better locking ways including the way that you suggest.

> 5. Why we need CreateIndexOnIMMV, is it a optimize?
> It seems like when maintenance MV,
> the index may not be used because of our match conditions can’t use simple = operator

No, the index works even when we use simple = operator together with "IS NULL".
For example:

postgres=# \d mv
Materialized view "public.mv"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
v1 | integer | | |
v2 | integer | | |
Indexes:
"mv_index" UNIQUE, btree (id) NULLS NOT DISTINCT

postgres=# EXPLAIN ANALYZE
WITH diff(id, v1, v2) AS MATERIALIZED ((VALUES(42, 420, NULL::int)))
SELECT mv.* FROM mv, diff
WHERE (mv.id = diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND
(mv.v1 = diff.v1 OR (mv.v1 IS NULL AND diff.v1 IS NULL)) AND
(mv.v2 = diff.v2 OR (mv.v2 IS NULL AND diff.v2 IS NULL));

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------
Nested Loop (cost=133.87..137.92 rows=1 width=12) (actual time=0.180..0.191 rows=1 loops=1)
CTE diff
-> Result (cost=0.00..0.01 rows=1 width=12) (actual time=0.027..0.028 rows=1 loops=1)
-> CTE Scan on diff (cost=0.00..0.02 rows=1 width=12) (actual time=0.037..0.040 rows=1 loops=1)
-> Bitmap Heap Scan on mv (cost=133.86..137.88 rows=1 width=12) (actual time=0.127..0.132 rows=1 loops=1)
Recheck Cond: ((id = diff.id) OR (id IS NULL))
Filter: (((id = diff.id) OR ((id IS NULL) AND (diff.id IS NULL))) AND ((v1 = diff.v1) OR ((v1 IS NULL) AND (diff.v1 IS NULL))) AND ((v2 = diff.v2) OR ((v2 IS NULL) AND (diff.v2
IS NULL))))
Heap Blocks: exact=1
-> BitmapOr (cost=133.86..133.86 rows=1 width=0) (actual time=0.091..0.093 rows=0 loops=1)
-> Bitmap Index Scan on mv_index (cost=0.00..4.43 rows=1 width=0) (actual time=0.065..0.065 rows=1 loops=1)
Index Cond: (id = diff.id)
-> Bitmap Index Scan on mv_index (cost=0.00..4.43 rows=1 width=0) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: (id IS NULL)
Planning Time: 0.666 ms
Execution Time: 0.399 ms
(15 rows)

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-08 10:27:27 Re: Support for grabbing multiple consecutive values with nextval()
Previous Message Bharath Rupireddy 2022-07-08 10:09:14 Re: Add function to return backup_label and tablespace_map