| From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
|---|---|
| To: | Sami Imseih <samimseih(at)gmail(dot)com>, VASUKI M <vasukianand0119(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org> |
| Subject: | Re: Optional skipping of unchanged relations during ANALYZE? |
| Date: | 2026-01-22 21:22:04 |
| Message-ID: | 6b5d7f83-7ac8-47b1-ab7f-0040b65ad02a@tantorlabs.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I spent some more time thinking about this new option.
On 22.01.2026 23:18, Sami Imseih wrote:
> I can't help but think that this SMART option is not as smart as it
> should be to actually
> be valuable.
>
> I agree that we should never skip a table that has never been
> analyzed. My concern
> is that n_mod_since_analyze == 0 is not very useful.
IMO, for the purpose of ensuring that we never skip relations that have
never been analyzed, checking last_analyze / last_autoanalyze being NULL
seems sufficient and reliable.
> What if I modify
> 1 tuple? does
> that really justify an ANALYZE to run on the table? Shouldn't the
> decision be driven based
> on some threshold calculation; similar to how autoanalyze makes the decision?
The primary purpose of ANALYZE is to allow users to explicitly
rebuildstatistics when they believe it is necessary. When a user
specifiesparticular tables or columns (e.g., ANALYZE table; or ANALYZE
table(i, j); ), I would not expect them to use this newoption - in that
case, the intent is usually to force statistics to berecollected.
However, the situation looks different when ANALYZE is run across
theentire database (i.e., plain ANALYZE;). In that context, havingan
option to skip relations that are known not to have changed sincetheir
last analyze seems useful, as it avoids doing work that is
clearlyunnecessary. That said, I think we still need to be precise about
what exactly "relations that have not changed" means in this context, in
order to understand where statistics would and would not be rebuilt. In
particular, relying solely on n_mod_since_analyze == 0 does not seem
sufficient, as we have already discussed several cases where ANALYZE may
still be required even without direct data modifications (e.g.
partitioned tables, inheritance, foreign tables, extended statistics, etc.)
About thresholds: I’m not convinced they make much sense for
manualANALYZE. autovacuum already exists to decide when statistics need
tobe refreshed based on thresholds, and if those conditions are met,
itwill run automatically. I’m not sure there is much value in
duplicatingthat logic for explicit ANALYZE commands.
What do you think?
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-01-22 21:24:13 | Re: Fix rounding method used to compute huge pages |
| Previous Message | Khoa Nguyen | 2026-01-22 21:15:32 | Re: [PATCH] Align verify_heapam.c error message offset with test expectations |