Re: Incremental View Maintenance, take 2

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental View Maintenance, take 2
Date: 2023-09-01 07:42:17
Message-ID: CACJufxEoCCJE1vntJp1SWjen8vBUa3vZLgL=swPwar4zim976g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hi
based on v29.
based on https://stackoverflow.com/a/4014981/1560347:
I added a new function append_update_set_caluse, and deleted
functions: {append_set_clause_for_count, append_set_clause_for_sum,
append_set_clause_for_avg, append_set_clause_for_minmax}

I guess this way is more extensible/generic than yours.

replaced the following code with the generic function: append_update_set_caluse.
+ /* For views with aggregates, we need to build SET clause for
updating aggregate
+ * values. */
+ if (query->hasAggs && IsA(tle->expr, Aggref))
+ {
+ Aggref *aggref = (Aggref *) tle->expr;
+ const char *aggname = get_func_name(aggref->aggfnoid);
+
+ /*
+ * We can use function names here because it is already checked if these
+ * can be used in IMMV by its OID at the definition time.
+ */
+
+ /* count */
+ if (!strcmp(aggname, "count"))
+ append_set_clause_for_count(resname, aggs_set_old, aggs_set_new,
aggs_list_buf);
+
+ /* sum */
+ else if (!strcmp(aggname, "sum"))
+ append_set_clause_for_sum(resname, aggs_set_old, aggs_set_new, aggs_list_buf);
+
+ /* avg */
+ else if (!strcmp(aggname, "avg"))
+ append_set_clause_for_avg(resname, aggs_set_old, aggs_set_new, aggs_list_buf,
+ format_type_be(aggref->aggtype));
+
+ else
+ elog(ERROR, "unsupported aggregate function: %s", aggname);
+ }
----------------------<<<
attached is my refactor. there is some whitespace errors in the
patches, you need use
git apply --reject --whitespace=fix
basedon_v29_matview_c_refactor_update_set_clause.patch

Also you patch cannot use git apply, i finally found out bulk apply
using gnu patch from
https://serverfault.com/questions/102324/apply-multiple-patch-files.
previously I just did it manually one by one.

I think if you use { for i in $PATCHES/v29*.patch; do patch -p1 < $i;
done } GNU patch, it will generate an .orig file for every modified
file?
-----------------<<<<<
src/backend/commands/matview.c
2268: /* For tuple deletion */
maybe "/* For tuple deletion and update*/" is more accurate?
-----------------<<<<<
currently at here: src/test/regress/sql/incremental_matview.sql
98: -- support SUM(), COUNT() and AVG() aggregate functions
99: BEGIN;
100: CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i,
SUM(j), COUNT(i), AVG(j) FROM mv_base_a GROUP BY i;
101: SELECT * FROM mv_ivm_agg ORDER BY 1,2,3,4;
102: INSERT INTO mv_base_a VALUES(2,100);

src/backend/commands/matview.c
2858: if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT)
2859: elog(ERROR, "SPI_exec failed: %s", querybuf.data);

then I debug, print out querybuf.data:
WITH updt AS (UPDATE public.mv_ivm_agg AS mv SET __ivm_count__ =
mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ , sum =
(CASE WHEN mv.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_sum__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.sum
IS NULL THEN diff.sum WHEN diff.sum IS NULL THEN mv.sum ELSE (mv.sum
OPERATOR(pg_catalog.+) diff.sum) END), __ivm_count_sum__ =
(mv.__ivm_count_sum__ OPERATOR(pg_catalog.+) diff.__ivm_count_sum__),
count = (mv.count OPERATOR(pg_catalog.+) diff.count), avg = (CASE WHEN
mv.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 AND
diff.__ivm_count_avg__ OPERATOR(pg_catalog.=) 0 THEN NULL WHEN
mv.__ivm_sum_avg__ IS NULL THEN diff.__ivm_sum_avg__ WHEN
diff.__ivm_sum_avg__ IS NULL THEN mv.__ivm_sum_avg__ ELSE
(mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__)::numeric END) OPERATOR(pg_catalog./)
(mv.__ivm_count_avg__ OPERATOR(pg_catalog.+) diff.__ivm_count_avg__),
__ivm_sum_avg__ = (CASE WHEN mv.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 AND diff.__ivm_count_avg__
OPERATOR(pg_catalog.=) 0 THEN NULL WHEN mv.__ivm_sum_avg__ IS NULL
THEN diff.__ivm_sum_avg__ WHEN diff.__ivm_sum_avg__ IS NULL THEN
mv.__ivm_sum_avg__ ELSE (mv.__ivm_sum_avg__ OPERATOR(pg_catalog.+)
diff.__ivm_sum_avg__) END), __ivm_count_avg__ = (mv.__ivm_count_avg__
OPERATOR(pg_catalog.+) diff.__ivm_count_avg__) FROM new_delta AS diff
WHERE (mv.i OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i
IS NULL)) RETURNING mv.i) INSERT INTO public.mv_ivm_agg (i, sum,
count, avg, __ivm_count_sum__, __ivm_count_avg__, __ivm_sum_avg__,
__ivm_count__) SELECT i, sum, count, avg, __ivm_count_sum__,
__ivm_count_avg__, __ivm_sum_avg__, __ivm_count__ FROM new_delta AS
diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.i
OPERATOR(pg_catalog.=) diff.i OR (mv.i IS NULL AND diff.i IS NULL)));

At this final SPI_exec, we have a update statement with related
columns { __ivm_count_sum__, sum, __ivm_count__, count, avg,
__ivm_sum_avg__, __ivm_count_avg__}. At this time, my mind stops
working, querybuf.data is way too big, but I still feel like there is
some logic associated with these columns, maybe we can use it as an
assertion to prove that this query (querybuf.len = 1834) is indeed
correct.

Since the apply delta query is quite complex, I feel like adding some
"if debug then print out the final querybuf.data end if" would be a
good idea.

we add hidden columns somewhere, also to avoid corner cases, so maybe
somewhere we should assert total attribute number is sane.

Attachment Content-Type Size
basedon_v29_matview_c_refactor_update_set_clause.patch text/x-patch 20.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-09-01 07:49:21 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Ashutosh Bapat 2023-09-01 07:41:27 Re: persist logical slots to disk during shutdown checkpoint