Re: Optional skipping of unchanged relations during ANALYZE?

From: VASUKI M <vasukianand0119(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org>
Subject: Re: Optional skipping of unchanged relations during ANALYZE?
Date: 2026-01-23 06:33:27
Message-ID: CAE2r8H4+SoMrCXZx987em2VW5tR=N_0xtj28B8R6dzuLon=bzQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Thanks a lot for the detailed feedback — this has been very
helpful.Answering to all mails in one.

A few clarifications on intent and scope, and how this relates to the
points raised:

Autovacuum overlap
I agree there is some conceptual overlap with autovacuum’s analyze decision
logic. The intent here is not to replace or duplicate autovacuum
heuristics, but to reduce clearly redundant work during explicit ANALYZE
runs (especially plain ANALYZE; across the whole database). Autovacuum
already handles threshold-based decisions well; this option is meant to be
a lightweight, explicit opt-in for manual ANALYZE usage.

Thresholds vs n_mod_since_analyze
I agree that n_mod_since_analyze == 0 is a very simple condition and not
“smart” in the general sense. That is intentional for now. This option is
not trying to answer when statistics should be refreshed optimally, but
only to skip relations that are known to be unchanged since the last
analyze. If even a single tuple is modified, SMART ANALYZE will still
re-run, preserving conservative behavior.

Tables never analyzed
As Christoph and Ilia pointed out earlier, skipping tables that were never
analyzed would be incorrect. The current logic explicitly avoids that by
requiring last_analyze or last_autoanalyze to be present before skipping.
Tables without prior statistics are always analyzed.

Relation to vacuumdb --missing-stats-only
I agree this is related but slightly different in intent.
--missing-stats-only answers “does this table have any statistics at all?”,
while SMART ANALYZE answers “has this table changed since the last
statistics collection?”. Both seem useful, but they target different use
cases. I see SMART ANALYZE primarily as a performance optimization for
repeated manual ANALYZE runs on mostly-static schemas.

Extended statistics / partitions / inheritance
These are valid concerns. The current patch intentionally does not attempt
to handle extended statistics, partitioned tables, inheritance, foreign
tables, etc. I wanted to start with a minimal, explicit, and conservative
behavior for regular relations only. I agree these areas need careful
consideration before extending the logic further, and I plan to look into
them based on feedback.

VACUUM vs ANALYZE
I also agree with the concern about adding more options to VACUUM. The
current patch focuses on ANALYZE usage; I’m not proposing this as a VACUUM
option.

NAMING
Although as sami said this SMART is not smart enough as it should be , I
will change name accordingly in the further patches based on urs and
others opinion once it is decided.
Based on feedback, I’m happy to revise direction, naming, or scope before
taking this further.

Thanks again for the thoughtful discussion — really appreciate the guidance.

Best regards,
Vasuki M
C-DAC,Chennai.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2026-01-23 06:33:32 Re: Race conditions in logical decoding
Previous Message Alexander Pyhalov 2026-01-23 06:18:29 Re: Limit memory usage by postgres_fdw batches