| From: | VASUKI M <vasukianand0119(at)gmail(dot)com> |
|---|---|
| To: | Sami Imseih <samimseih(at)gmail(dot)com> |
| Cc: | Robert Treat <rob(at)xzilla(dot)net>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org> |
| Subject: | Re: Optional skipping of unchanged relations during ANALYZE? |
| Date: | 2026-01-27 05:11:09 |
| Message-ID: | CAE2r8H5+kdis5JZxte_i7f1X8AVyUedh_PW1Kc5iOHaFV9=7Qw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>Maybe this is all an aside, but I don't think that was the vision for
>what the OP was trying to do with his patch, in that sense he was
>approaching it from a different angle, and I've been reading this
>thread trying to decide if people are just talking past each other.
>But after thinking about it some more, I think the above might be the
>more useful mental model for the discussion.
Hi Robert,
Thanks for taking the time to step back and summarize the discussion —
that’s very helpful.
I agree that part of the thread drifted toward broader questions about
when ANALYZE should run optimally, whereas my original
intent was narrower: providing a simple, explicit way to skip work that is
known to be unnecessary when running ANALYZE
across many relations. I appreciate you calling out that distinction.
Also, just a small note: I’m a she, not a he :)
coming to other mails:
Thanks everyone, this discussion has been extremely helpful.
I agree with the framing that has emerged here: there are really two
separate questions involved:
1) Which relations are missing statistics entirely?
2) Which relations have statistics, but may need them refreshed due to
modifications?
My original SMART ANALYZE prototype was trying to approximate both with
a very simple rule, but I agree that this makes the option vague and
harder to reason about, especially once cases like new columns, crash
recovery, extended statistics, and persistence are considered.
The idea of introducing explicit options such as ANALYZE (MISSING_STATS)
and ANALYZE (MODIFIED_STATS) feels like a much cleaner direction.
In particular, starting with MISSING_STATS as a SQL-level equivalent of
vacuumdb --missing-stats-only seems like a well-scoped and low-risk
first step.
I also agree that relying solely on pg_stat counters (e.g.
n_mod_since_analyze) has limitations due to their non-persistence across
crashes, which further supports handling “missing stats” separately
via catalog inspection.
I’m happy to pivot in this direction and focus first on a clear,
well-defined MISSING_STATS option for ANALYZE, and then revisit
MODIFIED_STATS (possibly reusing autoanalyze-style thresholds) as a
follow-up, once there is agreement on the semantics.
Thanks again for the thoughtful feedback — it’s been very educational.
Best regards,
Vasuki M
C-DAC,Chennai.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-01-27 05:13:32 | tablecmds: fix bug where index rebuild loses replica identity on partitions |
| Previous Message | Michael Paquier | 2026-01-27 04:55:06 | Re: Extended Statistics set/restore/clear functions. |