Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

From: VASUKI M <vasukianand0119(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Sami Imseih <samimseih(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, Christoph Berg <myon(at)debian(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds
Date: 2026-02-17 10:42:58
Message-ID: CAE2r8H4KRCJ055utU4u+3rBYSgAmiFgMgswaBMN_iOx16iTubQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Following up on the recent discussion around ANALYZE
(MISSING_STATS_ONLY):[1]

I would like to start a separate discussion about a potential ANALYZE
(MODIFIED_STATS) option.

The idea is to allow manual ANALYZE to reuse the same threshold logic that
autoanalyze uses, so that when a user explicitly runs ANALYZE, only
relations that have crossed the modification threshold are processed.

Conceptually, this would use the existing formula:

analyze threshold = analyze_base_threshold
+ analyze_scale_factor * reltuples

and compare it against n_mod_since_analyze, similar to how autovacuum
decides when to trigger analyze.

The goal is not to replace autoanalyze, but to expose its decision model at
SQL level for deterministic, user-controlled execution. For example:

-Running maintenance in scripted environments
-Triggering analysis immediately after batch data loads
-Avoiding unnecessary work when running manual ANALYZE across many relations
- Environments where autovacuum is tuned conservatively or partially
disabled

Autoanalyze runs opportunistically in the background. This proposal would
allow a user to apply the same threshold logic explicitly and immediately.

A possible usage would look like: ANALYZE (MODIFIED_STATS);

I understand there is conceptual overlap with autovacuum, so I would
especially appreciate feedback on:

-Whether exposing the threshold logic at SQL level makes sense
architecturally
-Whether this should remain an explicit opt-in option
-Naming (e.g., MODIFIED_STATS vs SKIP_UNMODIFIED or something clearer)
-Whether the thresholds should reuse existing GUCs or accept per-command
overrides

I intentionally kept this separate from MISSING_STATS_ONLY, since that
option answers a different question (“are stats missing?”) while this one
would answer (“have enough rows changed to justify re-analysis?”).

I would greatly appreciate thoughts before working on a prototype patch.

Thanks again for all the feedback so far — it has been very helpful
,expecting here the same.

Regards,
Vasuki M
C-DAC,Chennai

[1][
https://www.postgresql.org/message-id/CAE2r8H61ZtT4ek3JmLKDPmr7ALQ0uE9WSwwJRFHbXm0WdOJnEQ@mail.gmail.com
]

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumya S Murali 2026-02-17 10:47:48 Re: [PATCH] Expose checkpoint reason to completion log messages.
Previous Message Kirill Reshke 2026-02-17 10:14:45 Re: Release and unpin buffers after leaving CRIT section in GIN.