Re: Optional skipping of unchanged relations during ANALYZE?

From: VASUKI M <vasukianand0119(at)gmail(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>
Subject: Re: Optional skipping of unchanged relations during ANALYZE?
Date: 2026-02-13 12:14:30
Message-ID: CAE2r8H7vnihrOE5i+RvpU72jcyRkxMkhCY9mzBb+PBrTzSmo5g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

First of all, sorry for the delay in posting the updated patch. I was
occupied with some other work and holidays, and it took me some time to
revisit the design carefully. After several iterations and reworking the
logic based on the earlier feedback, I have now arrived at what I believe
is a cleaner and more aligned implementation. I am attaching v4 of the
patch for review.

The intention is to provide a SQL-level equivalent of vacuumdb
--analyze-only --missing-stats-only, while keeping the default ANALYZE
behavior completely unchanged. When this option is specified, ANALYZE will
process only those relations that are missing statistics, and skip
relations that already have complete statistics.

A relation is considered to be missing statistics if at least one
analyzable attribute (as determined by examine_attribute()) does not have a
corresponding row in pg_statistic, or if there is an extended statistics
object in pg_statistic_ext without a matching row in pg_statistic_ext_data.
In other words, the implementation relies on catalog inspection rather than
counters or timestamps, and it reuses existing ANALYZE internals instead of
redefining missing-stats logic independently.

The check is placed inside analyze_rel(), after the standard relation
validation and skip conditions, so that it does not interfere with
privilege checks or special-relation handling. The default ANALYZE path
remains unchanged.

In terms of behavior:

* A brand new empty table is analyzed (since it has no statistics).
* Re-running on an empty table analyzes again, because there are still no
pg_statistic rows.
* A table with data but no statistics is analyzed.
* Re-running after statistics exist causes the table to be skipped.
* If a new column is added and lacks statistics, the table is analyzed
again.
* After statistics are created for that column, subsequent runs skip the
table.
* If statistics are manually deleted or effectively lost (e.g., crash
recovery scenarios affecting stats tracking), the table is analyzed again.

Repeated runs therefore converge toward a no-op once all relations have
complete statistics.

Regression tests are included.

As discussed earlier in the thread, I plan to start a new discussion and
patch series for a separate ANALYZE (MODIFIED_STATS) option that would
reuse autoanalyze-style thresholds. I believe keeping MISSING_STATS_ONLY
and MODIFIED_STATS as separate, clearly defined options makes the semantics
easier to reason about.

I would greatly appreciate further review and feedback on this version.
Thank you all for the detailed guidance and suggestions so far — especially
regarding reuse of examine_attribute() and alignment with vacuumdb
behavior. This process has been very educational for me.

Thanks,
Vasuki M
C-DAC, Chennai

Attachment Content-Type Size
v4-0001-ANALYZE-add-MISSING_STATS_ONLY-option.patch text/x-patch 10.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-02-13 12:58:30 Re: Report bytes and transactions actually sent downtream
Previous Message Nitin Motiani 2026-02-13 12:05:14 Re: [PATCH] Support reading large objects with pg_read_all_data