| From: | VASUKI M <vasukianand0119(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | Sami Imseih <samimseih(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Christoph Berg <myon(at)debian(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se> |
| Subject: | Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds |
| Date: | 2026-02-17 10:42:58 |
| Message-ID: | CAE2r8H4KRCJ055utU4u+3rBYSgAmiFgMgswaBMN_iOx16iTubQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi all,
Following up on the recent discussion around ANALYZE
(MISSING_STATS_ONLY):[1]
I would like to start a separate discussion about a potential ANALYZE
(MODIFIED_STATS) option.
The idea is to allow manual ANALYZE to reuse the same threshold logic that
autoanalyze uses, so that when a user explicitly runs ANALYZE, only
relations that have crossed the modification threshold are processed.
Conceptually, this would use the existing formula:
analyze threshold = analyze_base_threshold
+ analyze_scale_factor * reltuples
and compare it against n_mod_since_analyze, similar to how autovacuum
decides when to trigger analyze.
The goal is not to replace autoanalyze, but to expose its decision model at
SQL level for deterministic, user-controlled execution. For example:
-Running maintenance in scripted environments
-Triggering analysis immediately after batch data loads
-Avoiding unnecessary work when running manual ANALYZE across many relations
- Environments where autovacuum is tuned conservatively or partially
disabled
Autoanalyze runs opportunistically in the background. This proposal would
allow a user to apply the same threshold logic explicitly and immediately.
A possible usage would look like: ANALYZE (MODIFIED_STATS);
I understand there is conceptual overlap with autovacuum, so I would
especially appreciate feedback on:
-Whether exposing the threshold logic at SQL level makes sense
architecturally
-Whether this should remain an explicit opt-in option
-Naming (e.g., MODIFIED_STATS vs SKIP_UNMODIFIED or something clearer)
-Whether the thresholds should reuse existing GUCs or accept per-command
overrides
I intentionally kept this separate from MISSING_STATS_ONLY, since that
option answers a different question (“are stats missing?”) while this one
would answer (“have enough rows changed to justify re-analysis?”).
I would greatly appreciate thoughts before working on a prototype patch.
Thanks again for all the feedback so far — it has been very helpful
,expecting here the same.
Regards,
Vasuki M
C-DAC,Chennai
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Soumya S Murali | 2026-02-17 10:47:48 | Re: [PATCH] Expose checkpoint reason to completion log messages. |
| Previous Message | Kirill Reshke | 2026-02-17 10:14:45 | Re: Release and unpin buffers after leaving CRIT section in GIN. |