From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: support create index on virtual generated column. |
Date: | 2025-07-24 10:46:39 |
Message-ID: | CACJufxHT-1a-uP9pEq6rpuvq8mS0V0xOHr30R4THsDeo5DyTUA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 23, 2025 at 4:54 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> > I'm interested in this feature, specifically whether the optimizer uses the
> > index in situations where the expression is used rather than the virtual
> > column name.
>
> Hmm, I kinda think we should not do this. The entire point of a
> virtual column is that its values are not stored and so you can
> (for example) change the generation expression "for free".
> If it's referenced in an index that advantage goes out the window
> because we'll have to rebuild the index.
>
> Besides, this does nothing you haven't been able to do for
> decades with expression indexes.
>
hi.
CREATE TABLE example (regular_name text, lowecase_name text GENERATED
ALWAYS AS (lower(regular_name)) VIRTUAL);
CREATE INDEX example_b ON example(lowecase_name);
CREATE INDEX example_c ON example(lower(regular_name));
select distinct indnatts,indnkeyatts,indisunique,
indnullsnotdistinct,indisprimary,indisexclusion,indimmediate,indisclustered,indisvalid,
indcheckxmin,indisready,indislive,indisreplident,indkey,indcollation,indclass,indoption,
indexprs
from pg_index
where indrelid ='example'::regclass;
will return one row, meaning catalog table pg_index stored almost all
the same information.
For indexes example_b and example_c, the only difference lies in the new column
indattrgenerated. In example_b, indattrgenerated is not null, whereas in
example_c, it is null. This column (indattrgenerated) is needed to track
dependencies on generated columns, which is important for index
rebuild.
obviously, get_relation_info will collect the same information for
example_b, example_c.
which means the optimizer will use the same information to make the decision.
---------------------------------
set enable_seqscan to off;
set enable_bitmapscan to off;
CREATE TABLE example (regular_name text, lowecase_name text GENERATED
ALWAYS AS (lower(regular_name)) VIRTUAL);
CREATE INDEX example_b ON example(lowecase_name);
EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE
lowecase_name = 'john q smith';
EXPLAIN(COSTS OFF) SELECT regular_name FROM example WHERE
lower(regular_name) = 'john q smith';
So current implementation, the above two query plans will produce the same query
plan. the generation expression or virtual generated column data type changes
will cause the index to rebuild.
Is this we want?
Or should changing the generation expression or data type of a virtual generated
column mark the associated index as invalid, without triggering a rebuild?
Attachment | Content-Type | Size |
---|---|---|
v6-0001-index-on-virtual-generated-column.patch | text/x-patch | 62.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-07-24 11:02:57 | Re: Fixing MSVC's inability to detect elog(ERROR) does not return |
Previous Message | Nazir Bilal Yavuz | 2025-07-24 10:34:30 | Re: Improve error reporting in 027_stream_regress test |