Re: Optional skipping of unchanged relations during ANALYZE?

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: VASUKI M <vasukianand0119(at)gmail(dot)com>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org>
Subject: Re: Optional skipping of unchanged relations during ANALYZE?
Date: 2026-01-23 20:31:26
Message-ID: CAA5RZ0tYuhHapyVBTw8tVfrKp6fyS5YBTVdQhYGOcWFg-ERyFA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the detailed summary!

It is important to point out that this feature is trying to do 2 distinct
things in 1 command. run analyze under when either one of these conditions
is true:

1/ Table has not been analyzed yet.
2/ Table has been modified.

> Thanks a lot for the detailed feedback — this has been very helpful.Answering to all mails in one.
>
> A few clarifications on intent and scope, and how this relates to the points raised:
>
> Autovacuum overlap
> I agree there is some conceptual overlap with autovacuum’s analyze decision logic.
> The intent here is not to replace or duplicate autovacuum heuristics, but to reduce

Yes, I agree with this.

> I agree that n_mod_since_analyze == 0 is a very simple condition
> and not “smart” in the general sense. That is intentional for now.
> This option is not trying to answer when statistics should be refreshed optimally,
> but only to skip relations that are known to be unchanged since the last analyze.
> If even a single tuple is modified, SMART ANALYZE will still re-run, preserving
> conservative behavior.

Yes, this is my concern. Why would I want to analyze if 1 row or a negligible
amount of rows are modified? I understand that this feature is trying to
keep the decision making very simple, but I think it's too simple to actually
be helpful in addressing the wasted effort of an ANALYZE command.

> Tables never analyzed
> As Christoph and Ilia pointed out earlier, skipping tables that were never analyzed would be incorrect.
> The current logic explicitly avoids that by requiring last_analyze or last_autoanalyze to be present
> before skipping. Tables without prior statistics are always analyzed.

I agree with this, but I think it's more than just tables that have
not been analyzed.
What if a new column is added after the last (auto)analyze. Would we not want to
trigger an analyze in that case?

> Relation to vacuumdb --missing-stats-only
> I agree this is related but slightly different in intent. --missing-stats-only
> answers “does this table have any statistics at all?”, while SMART ANALYZE
> answers “has this table changed since the last statistics collection?”. Both seem
> useful, but they target different use cases. I see SMART ANALYZE primarily
> as a performance optimization for repeated manual ANALYZE runs on mostly-static schemas.

SMART ANALYZE is trying to answer 2 questions "which table does not
have any statistics at all"
and "has this table changed since the last statistics collection?”, right?

So, maybe they need to be 2 separate options.

> Although as sami said this SMART is not smart enough as it should be ,
> I will change name accordingly in the further patches

Yup, I am not too fond of SMART in the name. Also, then name itself
is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
hand tell you exactly what they[re used for.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2026-01-23 21:03:53 Re: Don't synchronously wait for already-in-progress IO in read stream
Previous Message Andres Freund 2026-01-23 20:16:02 unnecessary executor overheads around seqscans