Quick Links

[Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW

From:	Adam Brusselback <adambrusselback(at)gmail(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	[Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW
Date:	2025-12-08 20:58:27
Message-ID:	CAMjNa7eFzTQ5=oZMQiB2bMkez5KP4A77JC7SRjeVEkOrh7cUHw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Attached is a patch implementing support for a WHERE clause in REFRESH
MATERIALIZED VIEW.

The syntax allows for targeted refreshes:
REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';

I was inspired to implement this feature after watching the Hacking
Postgres discussion on the topic:
https://www.youtube.com/watch?v=6cZvHjDrmlQ

This allows the user to restrict the refresh operation to a subset of the
view. The qualification is applied to the view's output columns. The
optimizer can then push this condition down to the underlying base tables,
avoiding a full scan when only a known subset of data has changed.

Implementation notes:

1. The grammar accepts an optional WHERE clause. We forbid volatile
functions in the clause to ensure correctness.

2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified,
the operation performs an in-place modification using a `ROW EXCLUSIVE`
lock.
* This mode requires a unique index to ensure constraint violations
are handled correctly (e.g., when a row's values change such that it
"drifts" into or out of the `WHERE` clause scope).
* It executes a Prune + Upsert strategy:
* `DELETE` all rows in the materialized view that match the `WHERE`
clause.
* `INSERT` the new data from the source query.
* It uses `ON CONFLICT DO UPDATE` during the insert phase to handle
concurrency edge cases, ensuring the refresh is robust against constraint
violations.

3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses
the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting
the scope of the diff (and the temporary table population) to the rows
matching the predicate. This requires an `EXCLUSIVE` lock and a unique
index, consistent with existing concurrent refresh behavior. It is much
slower than `Non-Concurrent Partial Refresh`

4. The execution logic uses SPI to inject the predicate into the source
queries during execution.

I have attached a benchmark suite to validate performance and correctness:

* `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing
system (`invoices` and `invoice_lines`). It includes an aggregated
materialized view (`invoice_summary`) and a control table
(`invoice_summary_table`).
* `workload_*.sql`: pgbench scripts simulating a high-churn environment
(45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset
sizes while generating significant refresh work.
* `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across
multiple scale factors and concurrency levels.

The benchmark compares strategies for keeping a summary up to date (vs
baseline):
* Partial Refresh: Triggers on the base table collect modified IDs and
execute `REFRESH MATERIALIZED VIEW ... WHERE ...`.
* Materialized Table (Control): A standard table maintained via complex
PL/pgSQL triggers (the traditional manual workaround).
* Full Refresh (Legacy): Manually refresh the view after changes.

Results are below:
Concurrency: 1 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 5309.05 | 0.002x 0.437x 0.470x
20000 50 | 1209.32 | 0.010x 0.600x 0.598x
20000 1000 | 56.05 | 0.164x 0.594x 0.576x
400000 1 | 5136.91 | 0 x 0.450x 0.487x
400000 50 | 1709.17 | 0 x 0.497x 0.482x
400000 1000 | 110.35 | 0.006x 0.507x 0.460x

Concurrency: 4 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 19197.50 | 0x 0.412x 0.435x
20000 50 | 1016.14 | 0.007x 0.966x 1.036x
20000 1000 | 9.94 | 0.708x 1.401x 1.169x
400000 1 | 19637.36 | 0x 0.436x 0.483x
400000 50 | 4669.32 | 0x 0.574x 0.566x
400000 1000 | 23.26 | 0.029x 1.147x 0.715x

Concurrency: 8 client(s)
----------------------------------------------------------------------------------
Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
---------- ------ | ------------ | ------------ ------------ ------------
20000 1 | 30358.32 | 0x 0.440x 0.457x
20000 50 | 262.75 | 0.026x 2.943x 2.740x
20000 1000 | 11.28 | 0.575x 0.840x 0.578x
400000 1 | 36007.15 | 0x 0.430x 0.464x
400000 50 | 6664.58 | 0x 0.563x 0.494x
400000 1000 | 11.61 | 0.058x 1.000x 1.277x

In these tests, the partial refresh behaves as O(delta) rather than
O(total), performing comparably to the manual PL/pgSQL approach but with
significantly lower code complexity for the user.

I recognize that adding a WHERE clause to REFRESH is an extension to the
SQL standard. I believe the syntax is intuitive, but I am open to
discussion regarding alternative implementation strategies or syntax if the
community feels a different approach is warranted.

New regression tests are included in the patch.

This is my first time submitting a patch to PostgreSQL, so please bear with
me if I've missed anything or made any procedural mistakes. I'm happy to
address any feedback.

Thanks,
Adam Brusselback

Attachment	Content-Type	Size
benchmark.zip	application/zip	7.1 KB
001_implement_mat_view_where.patch	text/x-patch	61.8 KB

Responses

Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW at 2025-12-09 05:08:58 from Kirk Wolak

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2025-12-08 21:19:54	Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Previous Message	Peter Geoghegan	2025-12-08 20:50:56	Re: index prefetching