Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Timur Magomedov <t(dot)magomedov(at)postgrespro(dot)ru>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, "Aya Iwata (Fujitsu)" <iwata(dot)aya(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date: 2025-06-25 03:13:25
Message-ID: CAHut+PtBPBQLBXFfVv2DTeZt_oMhGaZ-jH8tAbVC=M7KdMC9AQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 20, 2025 at 4:09 AM Timur Magomedov
<t(dot)magomedov(at)postgrespro(dot)ru> wrote:
>
> Hello Peter!
>
> Thank you for working on VCI updates.
> Here are some proposals for small improvements:

Hi Timur, Thanks for your feedback!

The newly posted v9-0002 addresses some of your comments. Details are below.

>
> Since hothash feature lives in separate patch now, vci_hothash.o should
> be removed from vci/executor/Makefile of VCI-module-main patch.

Fixed. Thanks. I also fixed a similar problem and removed vci_mp_rle.o
from storage/Makefile.

>
> 0001-Avoid-magic-numbers-in-vci_is_supported_type.patch
> I've looked at vci_supported_types.c and tried to rewrite it without
> using magic constants. Removed call to vci_check_supported_types() from
> SQL code since there is no need now to check that OID constants still
> match same types.
> Using defined constants for OIDs seems more robust than plain numbers.
> There were 23 type OIDs supported by VCI, all of them are there in a
> new version of code. I've replaced binary search by simple switch-case
> since compilers optimize it into nice binary decision tree. Previous
> version was binary searching among 87 structs. New version only
> searches among 23 integers.

Hmm. IIUC the “supported types” code was written this way for the
following reason. Out of an over-abundance (?) of caution, the
original patch authors wanted to call the function
'vci_check_supported_types' as a compatibility check, to discover
up-front (during CREATE EXTENSION) if any of the current PostgreSQL
OID definitions differ in any way since when the VCI extension was
built.
e.g.
- if vci build supports the type oid, but the oid value in current
PostgreSQL no longer exists then raise an error
- if vci build supports the type oid but the oid now has a different
meaning (typname mismatch) then raise an error

In other words, even though these type OIDs are in a range that is
supposed to be stable/unlikely to change, I think original VCI
developers were deliberately cautious about any chance of OID values
changing in later PostgreSQL versions.

So, for the compatibility checking we still need to keep that function
'vci_check_supported_types', and therefore we also still need to keep
the struct because that is where the known oid/name mapping (at time
of VCI extension build) is held which is used for the checking.

The current code is written so that when building a new VCI you only
need to execute:
"SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;"
for the current PostgreSQL then substitute in the necessary true/false
for support.

I agree this means the resulting vci_is_supported_type() is not as
efficient as your implementation, but OTOH it is perhaps easier to
maintain and check against new PostgreSQL releases?

Also, you'll encounter all the same problems with the supported
functions logic of vci_supported_funcs.c -- those have similar logic,
so maybe it is better to keep them similar despite inefficiencies?

FYI, I have re-executed the SQL
SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;
and this exposed a few changes since whenever this
vci_supported_type_table[] was last generated many years ago.

~~

AFAICT we could implement the vci_supported_type_table[] to *only*
include the types where supported is “true”. That would be more
efficient than now because then the entire array would only have 20ish
elements, but comes at a cost of perhaps being less easy to compare
with the executed SQL result.

Thoughts?

Also, while the vci_supported_type_table[] lookup is needed for the
compatibility check logic, I guess we could implement the
vci_is_supported_type() function exactly the same as your patch
switch code, to be called for the runtime checking (during CREATE
INDEX). That would be more efficient at runtime, but comes at a cost
of then having 2 functions to maintain.

Thoughts?

~~~

So, for now, I have only done the following:
- Updated vci_supported_type_table[] according to the current SQL result.
- Notice that “name” (NAMEOID) no longer qualifies as a VCI supported
type (because typelem is not 0 anymore) so the test code was modified
to remove the name column “c04”.
- In passing, I removed the costings from the EXPLAIN test output

>
> 0002-Updated-supported-funcs-SQL-for-recent-PostgreSQL.patch
> I wonder if it is possible to rewrite vci_supported_funcs.c in a
> similar way. There is a vci_supported_funcs.sql which I updated so it
> can be executed at master version of PostgreSQL.

OK. I included your changes in v9 and also added some IF EXISTS so the
.sql file can be run repeatedly without error. I also changed the
matching comment in the supported_funcs.c code where the SQL of this
file was described.

> I don't know if it is intentional, but safe_types list from
> vci_supported_funcs.sql have some differences to the types list from
> vci_supported_types.c. Specifically, it doesn't include varchar, varbit
> and uuid.

Thanks for reporting this! TBH, I also expected these lists should be the same.

I found multiple inconsistencies:
- Some items are in safe_types table but NOT in the
vci_supported_type_table[]
- Some Items are in the vci_supported_type_table[] but are
NOT in the safe_types table

Furthermore (and maybe partly caused by content of safe_types), the
results of running vci_support_func.sql also differs quite a lot also
from the vci_supported_func_table [] (in vci_supported_funcs.c), which
also does not seem right to me.
- Some items are in the SQL script results but are NOT in the
vci_supported_func_table []
- Some Items are in the vci_supported_func_table [] but are
NOT in the SQL script results

So, this remains an open problem. I am investigating if there is any
history/explanation of these differences. IIUC, then I hope later to
fix the safe_types, and then also fix the vci_supported_func_table[]
based on the result of that SQL script.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-06-25 03:42:44 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Zhijie Hou (Fujitsu) 2025-06-25 03:08:33 RE: Conflict detection for update_deleted in logical replication