| From: | Adam Brusselback <adambrusselback(at)gmail(dot)com> |
|---|---|
| To: | Kirk Wolak <wolakk(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW |
| Date: | 2025-12-09 16:27:58 |
| Message-ID: | CAMjNa7fJUwcOxf+qV8g+tCQ-3E-YAiKgE_Qs6u-xjdxe12T0SQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> Our situation was a wonderful MV with all the columns we needed (some
> hard to calculate) to augment search data done millions of times/day. It
> was a thing of beauty. Until we realized we needed to update 1 record
> (vendor inventory UPDATE date/time) every time we processed a file
> (something we do 24x7, a hundred times each hour!
>
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
Thanks for the feedback and the use case, Kirk.
Regarding that specific scenario where a single column ("last updated" or
similar) churns significantly faster than the heavy-computation columns:
Even with this patch, you might find it beneficial to separate that
high-velocity column into its own small materialized view (or regular view)
and join it to the main MV at query time. That will reduce the bloat you
get on the main MV by quite a lot, especially if you have very wide rows
(which it seems like you do).
I initially tried to implement logic that would allow for direct UPDATEs
(which would enable HOT updates). However, to handle rows that matched the
predicate but were no longer present in the new source data, I had to run
an anti-join to identify them for deletion. That approach caused
performance issues, so I settled on the "Prune + Upsert" strategy (DELETE
matching rows, then INSERT from source).
Because this patch performs a delete/insert cycle, updating that one
timestamp column will still result in rewriting the whole tuple in the MV.
> For that ONE field, we ended up doing REFRESH MV concurrently; OVER
> 2,000 times per day.
That said, 2,000 refreshes per day is nothing for this implementation,
provided your updates are selective enough and your queries allow for
predicate push-down to the base tables.
I look forward to your thoughts after reviewing the code.
Thanks,
Adam Brusselback
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2025-12-09 16:30:29 | Re: log_min_messages per backend type |
| Previous Message | Srinath Reddy Sadipiralla | 2025-12-09 16:27:08 | Re: pg_dump:qemu: uncaught target signal 7 (Bus error) - core dumped |