| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | VASUKI M <vasukianand0119(at)gmail(dot)com> |
| Cc: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org> |
| Subject: | Re: Optional skipping of unchanged relations during ANALYZE? |
| Date: | 2026-01-23 20:31:26 |
| Message-ID: | CAA5RZ0tYuhHapyVBTw8tVfrKp6fyS5YBTVdQhYGOcWFg-ERyFA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thanks for the detailed summary!
It is important to point out that this feature is trying to do 2 distinct
things in 1 command. run analyze under when either one of these conditions
is true:
1/ Table has not been analyzed yet.
2/ Table has been modified.
> Thanks a lot for the detailed feedback — this has been very helpful.Answering to all mails in one.
>
> A few clarifications on intent and scope, and how this relates to the points raised:
>
> Autovacuum overlap
> I agree there is some conceptual overlap with autovacuum’s analyze decision logic.
> The intent here is not to replace or duplicate autovacuum heuristics, but to reduce
Yes, I agree with this.
> I agree that n_mod_since_analyze == 0 is a very simple condition
> and not “smart” in the general sense. That is intentional for now.
> This option is not trying to answer when statistics should be refreshed optimally,
> but only to skip relations that are known to be unchanged since the last analyze.
> If even a single tuple is modified, SMART ANALYZE will still re-run, preserving
> conservative behavior.
Yes, this is my concern. Why would I want to analyze if 1 row or a negligible
amount of rows are modified? I understand that this feature is trying to
keep the decision making very simple, but I think it's too simple to actually
be helpful in addressing the wasted effort of an ANALYZE command.
> Tables never analyzed
> As Christoph and Ilia pointed out earlier, skipping tables that were never analyzed would be incorrect.
> The current logic explicitly avoids that by requiring last_analyze or last_autoanalyze to be present
> before skipping. Tables without prior statistics are always analyzed.
I agree with this, but I think it's more than just tables that have
not been analyzed.
What if a new column is added after the last (auto)analyze. Would we not want to
trigger an analyze in that case?
> Relation to vacuumdb --missing-stats-only
> I agree this is related but slightly different in intent. --missing-stats-only
> answers “does this table have any statistics at all?”, while SMART ANALYZE
> answers “has this table changed since the last statistics collection?”. Both seem
> useful, but they target different use cases. I see SMART ANALYZE primarily
> as a performance optimization for repeated manual ANALYZE runs on mostly-static schemas.
SMART ANALYZE is trying to answer 2 questions "which table does not
have any statistics at all"
and "has this table changed since the last statistics collection?”, right?
So, maybe they need to be 2 separate options.
> Although as sami said this SMART is not smart enough as it should be ,
> I will change name accordingly in the further patches
Yup, I am not too fond of SMART in the name. Also, then name itself
is vague. SKIP_LOCKED and BUFFER_USAGE_LIMIT on the other
hand tell you exactly what they[re used for.
--
Sami Imseih
Amazon Web Services (AWS)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Melanie Plageman | 2026-01-23 21:03:53 | Re: Don't synchronously wait for already-in-progress IO in read stream |
| Previous Message | Andres Freund | 2026-01-23 20:16:02 | unnecessary executor overheads around seqscans |