Quick Links

Re: Optional skipping of unchanged relations during ANALYZE?

From:	Sami Imseih <samimseih(at)gmail(dot)com>
To:	VASUKI M <vasukianand0119(at)gmail(dot)com>
Cc:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org>
Subject:	Re: Optional skipping of unchanged relations during ANALYZE?
Date:	2026-01-23 20:31:26
Message-ID:	CAA5RZ0tYuhHapyVBTw8tVfrKp6fyS5YBTVdQhYGOcWFg-ERyFA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the detailed summary!

It is important to point out that this feature is trying to do 2 distinct
things in 1 command. run analyze under when either one of these conditions
is true:

1/ Table has not been analyzed yet.
2/ Table has been modified.

> Thanks a lot for the detailed feedback — this has been very helpful.Answering to all mails in one.
>
> A few clarifications on intent and scope, and how this relates to the points raised:
>
> Autovacuum overlap
> I agree there is some conceptual overlap with autovacuum’s analyze decision logic.
> The intent here is not to replace or duplicate autovacuum heuristics, but to reduce

Yes, I agree with this.

> I agree that n_mod_since_analyze == 0 is a very simple condition
> and not “smart” in the general sense. That is intentional for now.
> This option is not trying to answer when statistics should be refreshed optimally,
> but only to skip relations that are known to be unchanged since the last analyze.
> If even a single tuple is modified, SMART ANALYZE will still re-run, preserving
> conservative behavior.

Yes, this is my concern. Why would I want to analyze if 1 row or a negligible
amount of rows are modified? I understand that this feature is trying to
keep the decision making very simple, but I think it's too simple to actually
be helpful in addressing the wasted effort of an ANALYZE command.

> Tables never analyzed
> As Christoph and Ilia pointed out earlier, skipping tables that were never analyzed would be incorrect.
> The current logic explicitly avoids that by requiring last_analyze or last_autoanalyze to be present
> before skipping. Tables without prior statistics are always analyzed.

I agree with this, but I think it's more than just tables that have
not been analyzed.
What if a new column is added after the last (auto)analyze. Would we not want to
trigger an analyze in that case?

> Relation to vacuumdb --missing-stats-only
> I agree this is related but slightly different in intent. --missing-stats-only
> answers “does this table have any statistics at all?”, while SMART ANALYZE
> answers “has this table changed since the last statistics collection?”. Both seem
> useful, but they target different use cases. I see SMART ANALYZE primarily
> as a performance optimization for repeated manual ANALYZE runs on mostly-static schemas.

SMART ANALYZE is trying to answer 2 questions "which table does not
have any statistics at all"
and "has this table changed since the last statistics collection?”, right?

So, maybe they need to be 2 separate options.

> Although as sami said this SMART is not smart enough as it should be ,
> I will change name accordingly in the further patches

Yup, I am not too fond of SMART in the name. Also, then name itself
is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
hand tell you exactly what they[re used for.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Re: Optional skipping of unchanged relations during ANALYZE? at 2026-01-23 06:33:27 from VASUKI M

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Melanie Plageman	2026-01-23 21:03:53	Re: Don't synchronously wait for already-in-progress IO in read stream
Previous Message	Andres Freund	2026-01-23 20:16:02	unnecessary executor overheads around seqscans