| From: | Zizhuan Liu <44973863(at)qq(dot)com> |
|---|---|
| To: | cca5507 <cca5507(at)qq(dot)com>, David Geier <geidav(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | 我自己的邮箱 <44973863(at)qq(dot)com> |
| Subject: | Re: Avoid calling SetMatViewPopulatedState if possible |
| Date: | 2026-05-28 04:31:34 |
| Message-ID: | tencent_64F9B79FA511757D232E7C5D1B3D2DC24406@qq.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
HI, ChangAo Chen, David Geier , hackers
>> But refreshing a materialized view doesn't only change relispopulated
>> but also columns like relfilenode, relpages, relhasindex, etc. Doesn't
>> changing these columns during REFRESH MATERIALIZED VIEW make your
>> optimization applicable in a lot less cases
>
>I don't think so. If we can skip SetMatViewPopulatedState(), we avoid generating
>a dead pg_class tuple in all cases.
>
>> I'm actually wondering why it works at all, even in the example you
>> gave. Because I thought that even when nothing has changed the pg_class
>> row is updated for more columns than just relispopulated.
>
>"refresh materialized view concurrently" is done by doing DELETE + INSERT
>to the matview directly, so only relispopulated will change before the patch.
>After the patch, the pg_class row don't change anymore.
1.Regarding pg_class.relispopulated
This column was introduced in PostgreSQL 9.3 specifically for materialized views.
Conceptually, the state of "IsScannable" is equivalent to and fully dependent on relispopulated.
This flag(relispopulated) is also used across multiple features, including COPY/pg_repack/pg_dump.
For the definitions of the populated and scannable states of materialized views, please refer to the official documentation:
https://www.postgresql.org/docs/devel/sql-creatematerializedview.html
https://www.postgresql.org/docs/devel/sql-refreshmaterializedview.html
2.Current status
As ChangAo Chen mentioned and verified via tests, and based on my own research and testing,
I agree with this conclusion:
If pg_class.relispopulated does not actually change, there is *no need* to invoke SetMatViewPopulatedState().
Skipping this call will also avoid executing the subsequent call chain:
CatalogTupleUpdate() -> simple_heap_update() -> heap_update().
This helps prevent dead tuples from being generated, and also reduces overhead on underlying mechanisms
including locks, transactions, WAL (xlog), as well as IPC contention and synchronization.
3.Regarding updates to attributes in pg_class
> But refreshing a materialized view doesn't only change relispopulated
> but also columns like relfilenode, relpages, relhasindex, etc. Doesn't
> changing these columns during REFRESH MATERIALIZED VIEW make your
> optimization applicable in a lot less cases?
To clarify: SetMatViewPopulatedState() is dedicated solely to modifying relispopulated.
Columns such as relfilenode, relpages and relhasindex are updated by other separate routines.
4. About SetMatViewPopulatedState() and its caller
There are two possible ways to implement this optimization:
--The first approach is to refactor SetMatViewPopulatedState() to handle the check internally:
If ((Form_pg_class) GETSTRUCT(tuple))->relispopulated != newstate, then perform the actual update; otherwise, do nothing.
--The second approach is to perform the check inside the caller RefreshMatViewByOid(), as implemented in ChangAo Chen’s patch.
I also noticed that intorel_startup() in createas.c follows the same design pattern — letting the caller handle the check before invoking the callee.
5. A small suggestion
After further consideration, I suggest using the condition:
if (RelationIsPopulated(matviewRel) != !skipData)
This is much more readable and intuitive compared to the original:
if (RelationIsPopulated(matviewRel) == skipData)
Especially when paired with the original call:
SetMatViewPopulatedState(matviewRel, !skipData);
Additionally, avoiding unnecessary updates and thus preventing dead tuples is especially valuable in BI/OLAP environments that use many materialized views and refresh them frequently (daily or hourly).
6. About Testing
--The new patch passes all PostgreSQL regression tests:
xman(at)xman_lzz:~/patch_1/build/testrun/regress/regress$ grep -n matview regression.out
128:# parallel group (20 tests): init_privs drop_operator security_label password lock object_address tablesample collate groupingsets replica_identity spgist gin matview identity gist generated_stored rowsecurity brin join_hash privileges
137:ok 126 + matview 47396 ms
xman(at)xman_lzz:~/patch_1/build/testrun/regress/regress$
--I have also checked the output files and expected files for the materialized view regression tests, and they are identical:
xman(at)xman_lzz:~$ ls -l /home/xman/patch_1/postgres/src/test/regress/expected/matview.out
-rw-rw-r-- 1 xman xman 24946 May 25 22:29 /home/xman/patch_1/postgres/src/test/regress/expected/matview.out
xman(at)xman_lzz:~$ ls -l /home/xman/patch_1/build/testrun/regress/regress/results/matview.out
-rw-rw-r-- 1 xman xman 24946 May 27 23:05 /home/xman/patch_1/build/testrun/regress/regress/results/matview.out
xman(at)xman_lzz:~$ diff /home/xman/patch_1/postgres/src/test/regress/expected/matview.out /home/xman/patch_1/build/testrun/regress/regress/results/matview.out
xman(at)xman_lzz:~$ echo $?
0
xman(at)xman_lzz:~$
The above is what I have found so far. Please feel free to correct me if I have any misunderstandings.
Thanks,
--
Zizhuan Liu (X-MAN)
44973863(at)QQ(dot)COM
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Avoid-calling-SetMatViewPopulatedState-if-possibl.patch | application/octet-stream | 957 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhijie Hou (Fujitsu) | 2026-05-28 05:18:34 | RE: Adding REPACK [concurrently] |
| Previous Message | Amit Kapila | 2026-05-28 03:34:08 | Re: Adding REPACK [concurrently] |