Re: Allow to collect statistics on virtual generated columns

From: Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Allow to collect statistics on virtual generated columns
Date: 2026-03-24 10:57:27
Message-ID: 20260324195727.43416b69c71b5f63eed3b4bc@sraoss.co.jp
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 24 Mar 2026 09:18:06 +0000
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:

Thank you for your comments!

> On Wed, 31 Dec 2025 at 10:45, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> >
> > I've attached an updated patch that fixes the broken test since 10c4fe074a.
> >
>
> I took a look at this, and the patch appears to work as intended.
> However, I have my doubts as to whether this is the best approach.
>
> Building stats on a virtual generated column is potentially quite
> expensive, and not something that everyone will want, so I think this
> really should be an optional feature that people can selectively
> enable, if they want.

I agree that collecting statistics on a virtual generated column should
be optional. While it can be disabled or enabled with:

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ...

this is not very intuitive or discoverable for users, so a more explicit
interface might be preferable.

> Therefore, I think that the previous approach was probably better. If
> I'm understanding it correctly, that allowed
>
> CREATE STATISTICS stat_name ON virt_col FROM tbl;
>
> as well as allowing statistics to be built on expressions including
> virtual generated columns, making it more flexible.
>
> The problem with this previous approach was that it didn't work
> correctly if a virtual generated column's expression was changed using
> ALTER TABLE ... SET EXPRESSION. I think that could be solved by
> expanding generated column expressions at ANALYZE time, rather than at
> CREATE STATISTICS time.

Yes. I think this approach could work.

One concern is that CREATE STATISTICS does not allow creating extended
statistics on a single colmun. If we try this, it raises the following error:

ERROR: extended statistics require at least 2 columns

Therefore, if we take this approach, we would need to add an exception
to this rule for virtual columns.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-03-24 10:59:19 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Srinath Reddy Sadipiralla 2026-03-24 10:55:47 Re: Teach isolation tester about injection points in background workers