Re: Allow to collect statistics on virtual generated columns

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow to collect statistics on virtual generated columns
Date: 2025-12-31 10:44:55
Message-ID: 20251231194455.5975fda68ed8e941e4afb805@sraoss.co.jp
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2 Sep 2025 16:33:41 +0900
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:

> > > > This allows to collect statistics on virtual generated columns.
> > > >
> > > > During ANALYZE, generation expressions are expanded, and statistics are computed
> > > > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > > > are now exported from extended_stats.c. However, since they are no longer specific
> > > > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> > > >
> > > > To enable the optimizer to make use of these statistics, a new field named
> > > > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > > > virtual generated columns in the table. In examine_variable(), if an expression
> > > > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > > > are used for that expression.
> > > >
> > > > Example:
> > > >
> > > > - Before applying the patch, the cardinality estimate is erroneous.
> > > >
> > > > test=# create table t (i int, j int generated always as (i*10) virtual);
> > > > CREATE TABLE
> > > > test=# insert into t select generate_series(1,1000);
> > > > INSERT 0 1000
> > > > test=# insert into t select 1 from generate_series(1,1000);
> > > > INSERT 0 1000
> > > > test=# analyze t;
> > > > ANALYZE
> > > > test=# explain analyze select * from t where j = 10;
> > > > QUERY PLAN
> > > > -------------------------------------------------------------------------------------------------
> > > > Seq Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> > > > Filter: ((i * 10) = 10)
> > > > Rows Removed by Filter: 999
> > > > Buffers: shared hit=9
> > > > Planning:
> > > > Buffers: shared hit=10
> > > > Planning Time: 0.299 ms
> > > > Execution Time: 0.948 ms
> > > > (8 rows)
> > > >
> > > >
> > > > - After applying the patch, the cardinality estimate is correct.
> > > >
> > > > test=# analyze t;
> > > > ANALYZE
> > > > test=# explain analyze select * from t where j = 10;
> > > > QUERY PLAN
> > > > -----------------------------------------------------------------------------------------------------
> > > > Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> > > > Filter: ((i * 10) = 10)
> > > > Rows Removed by Filter: 999
> > > > Buffers: shared hit=9
> > > > Planning:
> > > > Buffers: shared hit=6
> > > > Planning Time: 0.374 ms
> > > > Execution Time: 1.028 ms
> > > > (8 rows)
> > > >
> > > >
> > > > Note that the patch is still a work in progress, so documentation and tests are not included.
> > >
> > > I've attached an updated patch.
> > >
> > > I modified the documentation to remove the statement that virtual generated columns
> > > do not have statistics.
> > >
> > > In addition, I added a test to ensure that statistics on virtual generated columns
> > > are available.
>

I've attached an updated patch that fixes the broken test since 10c4fe074a.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
v5-0001-Allow-to-collect-statistics-on-virtual-generated-.patch text/x-diff 14.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-12-31 11:35:51 Add "format" target to make and ninja to run pgindent and pgperltidy
Previous Message Chao Li 2025-12-31 10:36:43 Re: A typo in .cirrus.tasks.yml