Re: Optional skipping of unchanged relations during ANALYZE?

From: VASUKI M <vasukianand0119(at)gmail(dot)com>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Christoph Berg <myon(at)debian(dot)org>
Subject: Re: Optional skipping of unchanged relations during ANALYZE?
Date: 2026-01-22 09:19:00
Message-ID: CAE2r8H7BwgYX61eiz4XrkWThxNhKKbp9N3Vty9HCrT3JTaVmzA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi llia,

On Wed, Jan 21, 2026 at 4:19 PM Ilia Evdokimov <
ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:

> On 21.01.2026 12:56, VASUKI M wrote:
>
> On Wed, Jan 21, 2026 at 3:21 PM Christoph Berg <myon(at)debian(dot)org> wrote:
>
>> SMART is also a terribly non-descriptive name. How about CHANGED_ONLY?
>>
>
> Yeah i agree,as of now i am focusing on concept workflow will change name
> in next versions of patch.
>
> Regards,
> Vasuki M
> C-DAC,Chennai.
>
> So do I
>
>
> It seems to me that the condition for relations that have never had
> statistics collected might be incorrect. If I'm reading this correctly,
> shouldn't this be checking 'tabstat->mod_since_analyze > 0' instead of
> 'tabstat->mod_since_analyze == 0'? I tested it on simple query:
>
> CREATE TABLE t (i INT, j INT);
> INSERT INTO t SELECT i/10, i/100 FROM generate_series(1, 1000000) i;
> ANALYZE (SMART) t;
> SELECT COUNT(*) FROM pg_stats WHERE tablename = 't';
> count
> -------
> 0
> (1 row)
>

This passes now :)

As discussed in the recent thread, I am sharing a revised v2 patch that
introduces an optional SMART mode for ANALYZE.

When ANALYZE (SMART) is specified, relations are skipped if:
- they have been analyzed before (either manually or via autovacuum),
and
- they have not been modified since their last analyze
(n_mod_since_analyze = 0, based on pg_stat statistics).

Relations that have never been analyzed before are always analyzed
normally. The default ANALYZE behavior remains unchanged unless SMART
is explicitly requested.

The motivation is to reduce unnecessary ANALYZE work in databases with
a large number of mostly-static tables, while keeping the behavior
strictly opt-in.

Changes and clarifications in v2:
- Tables that have never been analyzed are never skipped
(checked via last_analyze_time / last_autoanalyze_time)
- Skip decisions rely only on pg_stat_user_tables counters
- The skip condition is n_mod_since_analyze == 0
- Regression tests are added to demonstrate:
-->SMART ANALYZE does not skip never-analyzed tables
-->Only modified tables are re-analyzed

This patch intentionally limits its scope to regular relations and
existing pg_stat statistics only. Partitioned tables, inheritance,
foreign tables, extended statistics, and statistics target changes are
not handled yet and can be considered in follow-up work based on
feedback.

The patch applies cleanly on current master and passes:
make distclean
./configure
make -j$(nproc)
make install
make check

See this:

analyze_test=# create table sa6 (id int);
CREATE TABLE
Time: 3.917 ms
analyze_test=# analyze(smart) sa6;
DEBUG: ANALYZE processing relation "sa6" (OID 131324)
ANALYZE
Time: 0.585 ms
analyze_test=# SELECT count(*) > 0 AS stats_created
FROM pg_stats
WHERE tablename = 'sa6';
stats_created
---------------
f
(1 row)

Time: 0.894 ms
analyze_test=# SELECT relname,
last_analyze,
n_mod_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'sa6';
relname | last_analyze | n_mod_since_analyze
---------+----------------------------------+---------------------
sa6 | 2026-01-22 10:35:23.005045+05:30 | 0
(1 row)

The empty table doesn't have any stats to show as pg_stat is column level
statistics;
these are created when rows exists ,it has 0 rows to make samples,most
common used values,etc,..so no data distribution

But when value is inserted ,

analyze_test=# CREATE TABLE sa4 (i int);
CREATE TABLE
Time: 10.290 ms
analyze_test=# INSERT INTO sa4 SELECT generate_series(1,10);
INSERT 0 10
Time: 45.373 ms
analyze_test=# analyze(smart) sa4;
DEBUG: ANALYZE processing relation "sa4" (OID 131310)
ANALYZE
Time: 47.771 ms
analyze_test=# SELECT count(*) > 0 AS stats_created
FROM pg_stats
WHERE tablename = 'sa4';
stats_created
---------------
t
(1 row)

Time: 0.945 ms

I would appreciate feedback on the overall approach.

Thanks for your time and review.

--
Best regards,
Vasuki M
C-DAC,Chennai

Attachment Content-Type Size
v2-0001-ANALYZE-Introduce-an-opt-in-SMART-option.patch text/x-patch 9.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-01-22 09:32:38 Re: DOC: fixes multiple errors in alter table doc
Previous Message Chao Li 2026-01-22 09:01:38 Re: tablecmds: reject CLUSTER ON for partitioned tables earlier