Re: Optional skipping of unchanged relations during ANALYZE?

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: VASUKI M <vasukianand0119(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>
Subject: Re: Optional skipping of unchanged relations during ANALYZE?
Date: 2026-02-19 18:23:35
Message-ID: CAA5RZ0v02tJ9uBnhR4Uf4EEte4fsLiL3uKXxO+Fcs_-D2kESUw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I am attaching v5 of the MISSING_STATS_ONLY patch.

Thanks for the patch. Here are some comments I have:

1/
+relation_has_missing_column_stats(Relation rel)
+{
+ AttrNumber attnum;
+
+ for (attnum = 1; attnum <= rel->rd_att->natts; attnum++)
+ {
+ VacAttrStats *stats;
+ HeapTuple statstup;
+ bool has_stats = false;
+
+ /*
+ * Step 1: Is this attribute analyzable at all?
+ * (skips dropped columns, system cols, attstattarget=0, etc.)
+ */
+ stats = examine_attribute(rel, attnum, NULL);

- and -

+ /*
+ * ANALYZE (MISSING_STATS_ONLY):
+ * Skip relation if it has no missing column or extended statistics.
+ */
+ if (params.options & VACOPT_MISSING_STATS_ONLY)
+ {
+ if (!relation_has_missing_column_stats(onerel) &&
+ !relation_has_missing_extended_stats(onerel))
+ {
+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+ RelationGetRelationName(onerel));
+

This approach seems off. relation_has_missing_column_stats() and
elation_has_missing_extended_stats()
are being called early, and relation_has_missing_column_stats() a also
calls examine_attribute() for all attributes.
Later on, examine_attribute() is called again for the required
attributes, so it ends up running twice.

I think the natural thing to do here is to make
relation_has_missing_column_stats() take in
an attnum to perform the lookup in the stats catalog on a specific
attribute. This call should occur inside
examine_attribute().
examine_attribute() can also take in a boolean, missing_stats_only, so
we only look up
the stats catalog only when the missing_stats_only option is
specified. Columns that already
have stats simply return NULL, causing them to be skipped naturally.

With that, inside do_analyze_rel(), we can then check if attr_cnt = 0,
the number of expr_indexes = 0
and we are not missing extended stats, we can just skip the relation
from analyze.

Something like this:

```
if (params.options & VACOPT_MISSING_STATS_ONLY &&
(attr_cnt == 0 && expr_cnt == 0 &&
!relation_has_missing_extended_stats(onerel)))
{
vac_close_indexes(nindexes, Irel, NoLock);
return;
}
```

2/
I think we need more test coverage, inheritance partitions,
declarative partitions,
index on expressions, ANALYZE with a column specified, etc.

3/

+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY):
skipping relation \"%s\"",
+ RelationGetRelationName(onerel));

I think logging should be at info level, and we should only log when
skipping a relation

The following seems unnecessary.

+ elog(DEBUG1,
+ "ANALYZE (MISSING_STATS_ONLY): relation
eligible \"%s\"",
+ RelationGetRelationName(onerel));
+ }
+
+
/*
* OK, let's do it. First, initialize progress reporting.
*/
@@ -314,6 +451,10 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
PgStat_Counter startreadtime = 0;
PgStat_Counter startwritetime = 0;

+ elog(DEBUG1, "ANALYZE processing relation \"%s\" (OID %u)",
+ RelationGetRelationName(onerel),
+ RelationGetRelid(onerel));

Also, I think it's better to use ereport like is being done for the other
ANALYZE logging, i.e.

```
appendStringInfo(&buf,
_("WAL usage:
%" PRId64 " records, %" PRId64 " full page images, %" PRIu64 " bytes,
%" PRIu64 " full page image bytes, %" PRId64 " buffers full\n"),
walusage.wal_records,
walusage.wal_fpi,
walusage.wal_bytes,
walusage.wal_fpi_bytes,

walusage.wal_buffers_full);
appendStringInfo(&buf, _("system usage: %s"),
pg_rusage_show(&ru0));

ereport(verbose ? INFO : LOG,
(errmsg_internal("%s", buf.data)));
```

4/
Make sure to run pgindent

> And while I think you might be able to
> argue that MODIFIED_STATS should also include MISSING_STATS (I do
> wonder though, does autoanalyze do that?)

No, autoanalyze decisions are driven purely off of thresholds.

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2026-02-19 18:24:17 Re: Hash aggregate collisions cause excessive spilling
Previous Message Alexandre Felipe 2026-02-19 18:12:41 Idea memory credit system