| From: | vellaipandiyan sm <vellaipandiyan(dot)sm(at)gmail(dot)com> |
|---|---|
| To: | Adam Brusselback <adambrusselback(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW |
| Date: | 2026-05-21 05:17:55 |
| Message-ID: | CAGXjcjn3Mf2QpAz-jqkNQ=uJcjpShDtZNPn0B6v_R2YcmQ=m3A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
I reviewed the REFRESH MATERIALIZED VIEW ... WHERE patch and had a few
questions around concurrency semantics.
- The original DELETE -> INSERT approach exposing a consistency gap makes
sense, especially once tuple locks disappear after DELETE. The newer FOR
UPDATE + single-CTE approach seems safer, though I wonder whether
overlapping refreshes could still encounter deadlock scenarios around
UPSERT conflicts.
- The CONCURRENTLY behavior also feels somewhat unintuitive here. With
WHERE refreshes, the non-CONCURRENT path appears more permissive for
writers than CONCURRENTLY WHERE, which seems opposite to the expectation
established by normal REFRESH MATERIALIZED VIEW semantics.
- It may also help to document the intended guarantees around overlapping
partial refreshes and concurrent DML on base tables.
Overall, the use case seems quite valuable for selective high-churn refresh
workloads.
Thanks for working on this patch.
Regards,
Vellaipandiyan
On Thu, May 21, 2026 at 10:44 AM Adam Brusselback <adambrusselback(at)gmail(dot)com>
wrote:
> Attached is a patch implementing support for a WHERE clause in REFRESH
> MATERIALIZED VIEW.
>
> The syntax allows for targeted refreshes:
> REFRESH MATERIALIZED VIEW mv WHERE invoice_id = ANY('{1,2,3}');
> REFRESH MATERIALIZED VIEW CONCURRENTLY mv WHERE customer_id = 42;
> REFRESH MATERIALIZED VIEW mv WHERE order_date >= '2023-01-01';
>
> I was inspired to implement this feature after watching the Hacking
> Postgres discussion on the topic:
> https://www.youtube.com/watch?v=6cZvHjDrmlQ
>
> This allows the user to restrict the refresh operation to a subset of the
> view. The qualification is applied to the view's output columns. The
> optimizer can then push this condition down to the underlying base tables,
> avoiding a full scan when only a known subset of data has changed.
>
> Implementation notes:
>
> 1. The grammar accepts an optional WHERE clause. We forbid volatile
> functions in the clause to ensure correctness.
>
> 2. Non-Concurrent Partial Refresh: When `CONCURRENTLY` is not specified,
> the operation performs an in-place modification using a `ROW EXCLUSIVE`
> lock.
> * This mode requires a unique index to ensure constraint violations
> are handled correctly (e.g., when a row's values change such that it
> "drifts" into or out of the `WHERE` clause scope).
> * It executes a Prune + Upsert strategy:
> * `DELETE` all rows in the materialized view that match the
> `WHERE` clause.
> * `INSERT` the new data from the source query.
> * It uses `ON CONFLICT DO UPDATE` during the insert phase to handle
> concurrency edge cases, ensuring the refresh is robust against constraint
> violations.
>
> 3. Concurrent Partial Refresh: When `CONCURRENTLY` is specified, it uses
> the existing diff/merge infrastructure (`refresh_by_match_merge`), limiting
> the scope of the diff (and the temporary table population) to the rows
> matching the predicate. This requires an `EXCLUSIVE` lock and a unique
> index, consistent with existing concurrent refresh behavior. It is much
> slower than `Non-Concurrent Partial Refresh`
>
> 4. The execution logic uses SPI to inject the predicate into the source
> queries during execution.
>
> I have attached a benchmark suite to validate performance and correctness:
>
> * `setup.sql`: Creates a schema `mv_benchmark` modeling an invoicing
> system (`invoices` and `invoice_lines`). It includes an aggregated
> materialized view (`invoice_summary`) and a control table
> (`invoice_summary_table`).
> * `workload_*.sql`: pgbench scripts simulating a high-churn environment
> (45% inserts, 10% updates, 45% deletes) to maintain roughly stable dataset
> sizes while generating significant refresh work.
> * `run_benchmark_comprehensive.sh`: Orchestrates the benchmark across
> multiple scale factors and concurrency levels.
>
> The benchmark compares strategies for keeping a summary up to date (vs
> baseline):
> * Partial Refresh: Triggers on the base table collect modified IDs and
> execute `REFRESH MATERIALIZED VIEW ... WHERE ...`.
> * Materialized Table (Control): A standard table maintained via complex
> PL/pgSQL triggers (the traditional manual workaround).
> * Full Refresh (Legacy): Manually refresh the view after changes.
>
> Results are below:
> Concurrency: 1 client(s)
>
> ----------------------------------------------------------------------------------
> Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
> ---------- ------ | ------------ | ------------ ------------ ------------
> 20000 1 | 5309.05 | 0.002x 0.437x 0.470x
>
> 20000 50 | 1209.32 | 0.010x 0.600x 0.598x
>
> 20000 1000 | 56.05 | 0.164x 0.594x 0.576x
>
> 400000 1 | 5136.91 | 0 x 0.450x 0.487x
>
> 400000 50 | 1709.17 | 0 x 0.497x 0.482x
>
> 400000 1000 | 110.35 | 0.006x 0.507x 0.460x
>
>
> Concurrency: 4 client(s)
>
> ----------------------------------------------------------------------------------
> Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
> ---------- ------ | ------------ | ------------ ------------ ------------
> 20000 1 | 19197.50 | 0x 0.412x 0.435x
>
> 20000 50 | 1016.14 | 0.007x 0.966x 1.036x
>
> 20000 1000 | 9.94 | 0.708x 1.401x 1.169x
>
> 400000 1 | 19637.36 | 0x 0.436x 0.483x
>
> 400000 50 | 4669.32 | 0x 0.574x 0.566x
>
> 400000 1000 | 23.26 | 0.029x 1.147x 0.715x
>
>
> Concurrency: 8 client(s)
>
> ----------------------------------------------------------------------------------
> Scale Batch | Baseline TPS | Full (Rel) Partial (Rel) Table (Rel)
> ---------- ------ | ------------ | ------------ ------------ ------------
> 20000 1 | 30358.32 | 0x 0.440x 0.457x
> 20000 50 | 262.75 | 0.026x 2.943x 2.740x
> 20000 1000 | 11.28 | 0.575x 0.840x 0.578x
> 400000 1 | 36007.15 | 0x 0.430x 0.464x
> 400000 50 | 6664.58 | 0x 0.563x 0.494x
> 400000 1000 | 11.61 | 0.058x 1.000x 1.277x
>
>
>
> In these tests, the partial refresh behaves as O(delta) rather than
> O(total), performing comparably to the manual PL/pgSQL approach but with
> significantly lower code complexity for the user.
>
> I recognize that adding a WHERE clause to REFRESH is an extension to the
> SQL standard. I believe the syntax is intuitive, but I am open to
> discussion regarding alternative implementation strategies or syntax if the
> community feels a different approach is warranted.
>
> New regression tests are included in the patch.
>
> This is my first time submitting a patch to PostgreSQL, so please bear
> with me if I've missed anything or made any procedural mistakes. I'm happy
> to address any feedback.
>
> Thanks,
> Adam Brusselback
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | vellaipandiyan sm | 2026-05-21 05:38:33 | Re: RFC: Allow EXPLAIN to Output Page Fault Information |
| Previous Message | vignesh C | 2026-05-21 05:03:09 | Re: Set notice receiver before libpq connection startup |