Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

From: Japin Li <japinli(at)hotmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, "Aya Iwata (Fujitsu)" <iwata(dot)aya(at)fujitsu(dot)com>, Timur Magomedov <t(dot)magomedov(at)postgrespro(dot)ru>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date: 2025-07-30 11:07:38
Message-ID: ME0P300MB0445827B6E9CC04E0FAEE446B624A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are the latest v15 patches.
>
> Changes include:
>
> PATCH 0002.
> - README now says user should not tamper with VCI internal relations
> - fixes/test the VACUUM bug -- fix provided by Japin [1]
> - fixes/tests the reported segv for attempted REFRESH of VCI internal
> relation -- see [2 comment#1]
> - fixes/tests VCI internal relation dependency on the indexed table
> - simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2]
>

Hi Peter,

Thanks for updating the patches.

1.
I've identified another TRAP failure. Here are the reproduction steps:

rm -rf demo
initdb -D demo
cat <<EOF >>demo/postgresql.auto.conf
shared_preload_libraries = 'vci'
max_worker_processes = '20'
EOF

pg_ctl -D demo start

cat <<EOF | psql postgres
CREATE EXTENSION vci;
CREATE TABLE t (id int, info text);
CREATE INDEX ON t USING vci (id);
INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id;
REINDEX TABLE t;
REINDEX TABLE t;
EOF

The current VCI design doesn't support REINDEX, which is expected. But I've
discovered an unexpected issue: running REINDEX on a table a second time causes
an assertion failure.

2.
+Internal Relation Types:
+- -1: TID relation (maps CRID to original TID)
+- -2: NULL vector (bit array for NULL values)
+- -3: Delete vector (bit array for deleted records)
+- -5: TID-CRID mapping table
+- -9: Data WOS (buffered row data)
+- -10: Whiteout WOS (deletion markers)
+- 0-N: ROS column data relations (one per indexed column)
+
+Example:
+For a VCI index on sales(customer_id, amount, date):
+
+Generated relations include:
+vci_0000001234_00000_d → Column 0 data (customer_id)
+vci_0000001234_00001_d → Column 1 data (amount)
+vci_0000001234_00002_d → Column 2 data (date)
+vci_0000001234_65535_d → TID relation
+vci_0000001234_65534_d → NULL vector
+vci_0000001234_65533_d → Delete vector
+vci_0000001234_65531_m → TID-CRID mapping
+vci_0000001234_65527_d → Data WOS
+vci_0000001234_65526_d → Whiteout WOS

The README states that it generates the above relations, but there are
additional internal relations that are not mentioned.

SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname;
relname | relkind
------------------------+---------
vci_0000016578_00000_d | m
vci_0000016578_00000_m | m
vci_0000016578_00001_d | m
vci_0000016578_00001_m | m
vci_0000016578_65526_d | m
vci_0000016578_65527_d | m
vci_0000016578_65530_0 | m
vci_0000016578_65530_1 | m
vci_0000016578_65531_d | m
vci_0000016578_65531_m | m
vci_0000016578_65533_d | m
vci_0000016578_65533_m | m
vci_0000016578_65534_d | m
vci_0000016578_65534_m | m
vci_0000016578_65535_d | m
vci_0000016578_65535_m | m

Based on the above, are the following materialized views unused, or is their
use just undocumented?

- vci_0000016578_00000_m
- vci_0000016578_00001_m
- vci_0000016491_65530_0
- vci_0000016578_65530_1
- vci_0000016578_65531_d
- vci_0000016578_65534_m
- vci_0000016578_65535_m

What is the purpose of the '0' and '1' suffixes?

3.
I've also found that the VCI index is not working. Is this the expected
behavior?

[local]:3209161 postgres=# \d+ t
Table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
info | text | | | | extended | | |
Indexes:
"t_id_idx" vci (id)
Access method: heap

[local]:3209161 postgres=# SET enable_seqscan TO off;
SET
[local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000;
QUERY PLAN
-----------------------------------------------------
Seq Scan on t (cost=0.00..2084.00 rows=1 width=37)
Disabled: true
Filter: (id = 1000)
(3 rows)

--
Regards,
Japin Li

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-07-30 11:09:01 Re: Support getrandom() for pg_strong_random() source
Previous Message Nazir Bilal Yavuz 2025-07-30 10:58:43 Re: Improve prep_buildtree