| From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
|---|---|
| To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
| Cc: | Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "zmlpostgres(at)gmail(dot)com" <zmlpostgres(at)gmail(dot)com> |
| Subject: | Re: Incremental View Maintenance, take 2 (design considerations) |
| Date: | 2026-05-29 14:14:17 |
| Message-ID: | 20260529231417.7cd817f97347dfa048608517@sraoss.co.jp |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I've attached rebased patches.
Thanks again to Alexandre Felipe for the previous rebase work.
Recent fixes
-------------
The attached patches also include fixes for several issues found in pg_ivm [1]:
* Handling base tables with dropped columns
* Fixing incorrect maintenance when the same table is modified multiple
times by triggers
* Preventing view inconsistency when a concurrent transaction modifies
a base table while creating or refreshing an IMMV
To address the view inconsistency issue, I introduced a shared hash table
that stores, for each IMMV, the transaction ID of the transaction that last
updated the view.
Currently, the size of the hash table is fixed, but it should be possible to
make it configurable via a GUC. Another possibility would be to use a dynamically
managed shared hash table, similar to ApplyLauncher{Set,Get}WorkerStartTime().
Possible simplifications
------------------------
In addition, I’m considering some further changes to simplify the patch set
and make it easier to review.
1. Use a reloption to indicate IVM
Currently, we use the INCREMENTAL keyword to create an IMMV, but I wonder
whether it would be better to use a reloption instead, for example:
CREATE MATERIALIZED VIEW ... WITH (incremental)
This might be preferable to adding a new field to pg_class or introducing
new syntax keywords.
2. Remove DISTINCT/aggregate support from the scope of the initial release
Although aggregation is a major use case for IVM, I think it may be better to
initially focus on simple SPJ views, in order to validate the core maintenance
design before tackling more complicated cases such as DISTINCT and aggregates.
This would allow us to avoid some difficult parts for now, such as handling
hidden columns for duplicate tuple counting and the more complex maintenance
logic currently implemented using SPI.
Furthermore, if we assume that each base table has a primary key (or replica
identity other than FULL), and all such columns appear in the target list,
the resulting view cannot contain duplicate tuples, which would simplify the
maintenance logic considerably.
3. Use transition tables to reconstruct the "pre-state" of tables
The state of a table before modification is required when multiple base tables
are modified within a single query.
The current patch handles this using a special function, ivm_visible_in_prestate(),
in the WHERE clause. This function checks whether a row is visible in a snapshot
captured before modification in a BEFORE trigger.
However, this approach feels somewhat awkward to me. Instead, it may be better to
reconstruct the pre-update state using transition tables (OLD/NEW). For example,
the pre-state relation could be reconstructed as follows:
SELECT * FROM tbl
UNION ALL
SELECT * FROM old_table
EXCEPT ALL
SELECT * FROM new_table;
(ALL is necessary to preserve tuple multiplicity.)
Other design considerations
---------------------------
Other design considerations were also presented in the poster [2] at
PGConf.dev 2026.
The goal was not to propose a complete design or implementation. Rather, I
wanted to identify open architectural questions and gather feedback on
possible directions and trade-offs where community consensus may eventually
be needed.
The poster discussed the following design considerations:
(1) Refresh timing
The current patch set implements an "immediate" approach, where the view is
updated immediately after each modification of a base table.
One advantage of this approach is that it does not require additional
infrastructure to track changes on base tables. However, it can have a
significant impact on update performance, and it also needs to address
concurrency issues during view maintenance, as reflected in the latest fixes
mentioned above.
In contrast, a "deferred" approach updates the view later, either asynchronously
or on demand. This could improve update performance, but it would require some
mechanism to track changes on base tables, which would introduce additional
complexity.
The current patch set uses an "immediate" approach mainly because it appears
to require less additional infrastructure and provide a simpler starting point
for experimentation.
(2) Maintenance placement and catalog design
The current patch set is trigger-based. Transition tables are used to capture
changes, and the materialized view is updated from the trigger. There is no
dedicated catalog that directly records the relationship between views and base
tables; instead, the relationship is represented indirectly through pg_trigger.
One advantage of this approach is that it reuses existing infrastructure, so it
does not require major executor changes.
On the other hand, it may be possible to implement IVM directly in the executor
without relying on triggers, and to manage the view-table relationship using a
dedicated catalog (for example, something like pg_matview_tables), similar to
how declarative partitioning is implemented.
The current implementation is trigger-based mainly because it reuses existing
infrastructure and seemed closer to how RI triggers for foreign key constraints
work. However, this approach may introduce additional complexity when nested
queries are executed from other triggers (such as user-defined triggers or RI
triggers).
It is also possible that integrating the IVM mechanism more directly into the
executor could simplify some aspects of the implementation. At this point, I
think both approaches still have trade-offs that need further evaluation.
In addition, I have been wondering whether it would be useful to introduce a
dedicated catalog or statistics view to store materialized view status information,
such as the last refresh time or staleness.
For example, such information could help users understand how stale a
materialized view is. It might also be useful for future planner optimizations
involving materialized views, such as query rewrite or automatic view substitution.
So, I plan to explore this further independently from the current IVM implementation
work.
(3) Capturing pre-update state (before changes)
As discussed above, the current patch uses a special function to access the
pre-update state of a table, and I think it would be better to reconstruct that
state using transition tables instead.
Another possibility would be to introduce new infrastructure to apply an older
snapshot selectively to a table scan (somewhat similar to Oracle's AS OF
functionality). However, I am not convinced that such infrastructure is really
necessary for this use case.
(4) Initial feature scope
The current patch includes support for DISTINCT, aggregates, and duplicate tuples.
However, as discussed above, I think it may be better to initially focus on simple
SPJ views.
There are still various design trade-offs to consider, so comments and discussion
on the overall direction would be greatly appreciated.
Regards,
Yugo Nagata
[1] https://github.com/sraoss/pg_ivm
[2] https://2026.pgconf.dev/posters/ivm.pdf
On Sun, 22 Feb 2026 23:41:17 +0900
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> On Mon, 16 Feb 2026 21:43:09 +0000
> Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com> wrote:
>
> Thank you for looking over and updating the patches!
> I’m planning an overhaul of the patch set, but I really appreciate your time
> and effort in getting them up to date.
>
> > Sorry,
> > the previous line missed the removal of the function declared but not used.
> >
> > Regards,
> > Alexandre
> >
> > On Mon, Feb 16, 2026 at 4:07 PM Alexandre Felipe <
> > o(dot)alexandre(dot)felipe(at)gmail(dot)com> wrote:
> >
> > > There was a warning on my initial rebase, so I fixed that.
> > >
> > > I also changed the bitmap set to a list, I don't think we need O(1) lookup
> > > here as suggested by Zhang [1] on patch 6.
> > >
> > > Yugo,
> > > I think there is an issue in
> > > src/backend/commands/matview.c, IVM_immediate_maintenance, line 1688, when
> > > apply_delta fails, and PG_RE_THROW is called, wouldn't we have to cleanup?
> > > As in line 1699 onwards?
>
> Do you mean calling clean_up_IVM_hash_entry() as part of the cleanup?
> I would need to look into this more carefully, but my understanding is that i
> might be handled by AtAbort_IVM() in that situation.
>
> Regards,
> Yugo Nagata
>
> > >
> > >
> > >
> > > Regards,
> > > Alexandre
> > >
> > > On Thu, Feb 12, 2026 at 6:08 PM Alexandre Felipe <
> > > o(dot)alexandre(dot)felipe(at)gmail(dot)com> wrote:
> > >
> > >> Sorry for creating a new thread for this.
> > >> I don't have the original email.
> > >>
> > >> This is my attempt on rebasing
> > >> https://commitfest.postgresql.org/patch/4337/
> > >>
> > >> Regards,
> > >> Alexandre
> > >>
> > >
>
>
> --
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
>
>
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
| Attachment | Content-Type | Size |
|---|---|---|
| v37-0011-Add-documentations-about-Incremental-View-Mainte.patch | text/x-diff | 24.7 KB |
| v37-0010-Add-regression-tests-for-Incremental-View-Mainte.patch | text/x-diff | 61.9 KB |
| v37-0009-Add-support-for-min-max-aggregates-for-IVM.patch | text/x-diff | 26.5 KB |
| v37-0008-Add-aggregates-support-in-IVM.patch | text/x-diff | 37.8 KB |
| v37-0007-Add-DISTINCT-support-for-IVM.patch | text/x-diff | 24.6 KB |
| v37-0006-Add-Incremental-View-Maintenance-support.patch | text/x-diff | 91.2 KB |
| v37-0005-Add-Incremental-View-Maintenance-support-to-psql.patch | text/x-diff | 5.7 KB |
| v37-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch | text/x-diff | 4.2 KB |
| v37-0003-Allow-to-prolong-life-span-of-transition-tables-.patch | text/x-diff | 6.0 KB |
| v37-0002-Add-relisivm-column-to-pg_class-system-catalog.patch | text/x-diff | 6.0 KB |
| v37-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch | text/x-diff | 5.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-05-29 15:14:29 | Re: future of PQfn() |
| Previous Message | Ethan Mertz | 2026-05-29 13:42:58 | Re: [PATCH] Improving index selection for logical replication apply with replica identity full |