| From: | VASUKI M <vasukianand0119(at)gmail(dot)com> |
|---|---|
| To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
| Cc: | Sami Imseih <samimseih(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Andreas Karlsson <andreas(at)proxel(dot)se>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org> |
| Subject: | Re: Optional skipping of unchanged relations during ANALYZE? |
| Date: | 2026-03-02 12:48:05 |
| Message-ID: | CAE2r8H7hYGYi4QM85Q7bxs4RbT0Vn63c9ONFbwhAjuSGDzah_A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Robert, Sami, Ilia, and everyone,
Thank you all for the detailed review and thoughtful feedback. I have
carefully gone through the comments and revised the patch accordingly. I
truly appreciate the guidance — it helped clarify both the design and the
implementation details.
Regarding Robert’s point about the two distinct use cases, I agree that
MISSING_STATS_ONLY and MODIFIED_STATS represent separate concerns.
MISSING_STATS_ONLY is catalog-driven and persistent in nature, ensuring
that newly created tables, newly added columns, and newly defined extended
statistics are not left without statistics. MODIFIED_STATS, on the other
hand, is more closely aligned with modification thresholds and
autoanalyze-like behavior, which is transient and threshold-based.
Keeping these concerns separate makes the semantics clearer and easier to
reason about. In particular, as Robert mentioned, MISSING_STATS_ONLY must
remain a standalone flag for scenarios such as integration into vacuumdb,
where ensuring that missing statistics are generated is the primary goal.
To align with the CommitFest process, I have created a separate entry for
this feature here:[1]
I will treat MODIFIED_STATS as a separate patch and discussion thread.
Addressing Sami’s technical comments:
1. Duplicate examine_attribute() calls
You were absolutely correct that the earlier version of the patch caused
examine_attribute() to be invoked twice for each attribute: once in
relation_has_missing_column_stats() and again during normal attribute
processing. That approach was inefficient and awkward.
In the revised version, I have removed the early relation-level column
scanning logic. Instead, the missing-statistics check is now integrated
directly inside examine_attribute(). The function now takes a boolean
missing_stats_only parameter. When this option is specified,
examine_attribute() performs the pg_statistic lookup for that specific
attribute. If statistics already exist, the function simply returns NULL,
causing the column to be skipped naturally.
This ensures:
-Each attribute is examined only once.
-The missing-statistics logic fits naturally into the existing flow.
-We avoid redundant catalog lookups.
Inside do_analyze_rel(), after collecting vacattrstats, we now check
whether:
-attr_cnt == 0
-no expression index attributes remain
-and there are no missing extended statistics
If all of these are true under MISSING_STATS_ONLY, the relation is skipped
cleanly.This follows the structure you suggested and simplifies the overall
design.
2. Test coverage
I have expanded the regression tests to cover the scenarios you
mentioned.This ensures that MISSING_STATS_ONLY behaves correctly across
more complex schema configurations.
3. Logging behavior
Thank you for the guidance on logging.I have removed the DEBUG-level elog()
calls introduced in earlier versions. Logging is now consistent with
existing ANALYZE behavior and uses ereport().
When VERBOSE mode is enabled and a relation is skipped due to
MISSING_STATS_ONLY, the output now includes a single INFO-level message of
the form:
INFO: Skipping analyzing "database.namespace.relation"
This matches the style used elsewhere in do_analyze_rel() and avoids
unnecessary additional details when no statistics are collected.There is no
additional logging when a relation is processed normally, keeping behavior
aligned with existing ANALYZE semantics.
Regarding autoanalyze and MODIFIED_STATS
As Ilia clarified, autoanalyze decisions are purely threshold-driven and do
not include missing-statistics logic. That reinforces the decision to keep
MISSING_STATS_ONLY separate and explicit.[2]
If there are further suggestions regarding semantics, naming, or additional
edge cases that should be tested, I would be very happy to incorporate them.
Thank you again for your time and detailed review.
Regards,
Vasuki M
C-DAC,Chennai
[1] https://commitfest.postgresql.org/patch/6516/
[2]
https://www.postgresql.org/message-id/flat/aZSm77WEh8pxQYtf%40nathan#1ab5c06a7d2247d90e71fb995fa21a39
>
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-ANALYZE-add-MISSING_STATS_ONLY-option.patch | text/x-patch | 16.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Geier | 2026-03-02 12:50:37 | Re: Convert NOT IN sublinks to anti-joins when safe |
| Previous Message | Tatsuya Kawata | 2026-03-02 12:47:23 | Re: [Patch]Add tab completion for DELETE ... USING |