From: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
---|---|
To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Allow to collect statistics on virtual generated columns |
Date: | 2025-08-20 05:10:28 |
Message-ID: | 20250820141028.dc06e0d1d787ce1151f69ed6@sraoss.co.jp |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 8 Aug 2025 12:21:25 +0900
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> On Fri, 1 Aug 2025 00:28:30 +0900
> Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> > Hi,
> >
> > On Tue, 24 Jun 2025 17:05:33 +0900
> > Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> >
> > > Instead, I'm thinking of an alternative approach: expanding the expression
> > > at the time statistics are collected.
> >
> > I've attached a new patch in this approache.
> >
> > This allows to collect statistics on virtual generated columns.
> >
> > During ANALYZE, generation expressions are expanded, and statistics are computed
> > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > are now exported from extended_stats.c. However, since they are no longer specific
> > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> >
> > To enable the optimizer to make use of these statistics, a new field named
> > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > virtual generated columns in the table. In examine_variable(), if an expression
> > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > are used for that expression.
> >
> > Example:
> >
> > - Before applying the patch, the cardinality estimate is erroneous.
> >
> > test=# create table t (i int, j int generated always as (i*10) virtual);
> > CREATE TABLE
> > test=# insert into t select generate_series(1,1000);
> > INSERT 0 1000
> > test=# insert into t select 1 from generate_series(1,1000);
> > INSERT 0 1000
> > test=# analyze t;
> > ANALYZE
> > test=# explain analyze select * from t where j = 10;
> > QUERY PLAN
> > -------------------------------------------------------------------------------------------------
> > Seq Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> > Filter: ((i * 10) = 10)
> > Rows Removed by Filter: 999
> > Buffers: shared hit=9
> > Planning:
> > Buffers: shared hit=10
> > Planning Time: 0.299 ms
> > Execution Time: 0.948 ms
> > (8 rows)
> >
> >
> > - After applying the patch, the cardinality estimate is correct.
> >
> > test=# analyze t;
> > ANALYZE
> > test=# explain analyze select * from t where j = 10;
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------
> > Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> > Filter: ((i * 10) = 10)
> > Rows Removed by Filter: 999
> > Buffers: shared hit=9
> > Planning:
> > Buffers: shared hit=6
> > Planning Time: 0.374 ms
> > Execution Time: 1.028 ms
> > (8 rows)
> >
> >
> > Note that the patch is still a work in progress, so documentation and tests are not included.
>
> I've attached an updated patch.
>
> I modified the documentation to remove the statement that virtual generated columns
> do not have statistics.
>
> In addition, I added a test to ensure that statistics on virtual generated columns
> are available.
I've attached a rebased patch.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Allow-to-collect-statistics-on-virtual-generated-.patch | text/x-diff | 14.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2025-08-20 05:11:15 | Re: Test instability when pg_dump orders by OID |
Previous Message | Kirill Reshke | 2025-08-20 05:03:10 | Re: VM corruption on standby |