Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Japin Li <japinli(at)hotmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, "Aya Iwata (Fujitsu)" <iwata(dot)aya(at)fujitsu(dot)com>, Timur Magomedov <t(dot)magomedov(at)postgrespro(dot)ru>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date: 2025-07-14 08:47:06
Message-ID: CAHut+PtF0Mu=QPhCyTuUJg0RuGSC7Vjr5f6rsasmr+SeMk7L2g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Japin,

Thanks for your README questions.

On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
...
>
> 3.
> In the README, 'TID' seems to have conflicting definitions:
> Transaction ID (2.1) vs. tuple physical identifier (2.3.1).
>
> Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
> if my understanding is correct.
>

Yes, TID was meant only for the Tuple identifier. Some terms became
muddled. Hopefully, those are fixed now.

> 4.
> -1: TID relation (maps CRID to original TID)
> -5: TID-CRID mapping table
>
> I'm trying to understand the distinctions here. Based on the definition in
> vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
> suggesting a potential redundancy.
>
> /*
> * TID-CRID pair used for TIDCRID update list
> */
> typedef struct vcis_tidcrid_pair_item
> {
> ItemPointerData page_item_id; /* TID on the original relation */
> vcis_Crid crid; /* CRID */
> } vcis_tidcrid_pair_item_t;
>
> How they are different? I see the code in vci_tidcrid.c
>

AFAIK, the distinction is described by the code comments in vci_columns.h:

+/** Column ID of special column */
+#define VCI_COLUMN_ID_TID (-1)
+#define VCI_COLUMN_ID_NULL (-2)
+#define VCI_COLUMN_ID_DELETE (-3)

So those are all special columns in the ROS data part. In other words,
these internal relations all have data that is indexed by the CRID –
e.g “Delete vector” (2.3.3) and “Null information” (2.3.4). So here,
the TID relation is the mapping from the CRID back to the original
TID.

On the other hand, the other relations...

+/** The data below are not column-stored data.
+ * We prepare them for convenience.
+ */
+#define VCI_COLUMN_ID_TID_CRID (-5)
+#define VCI_COLUMN_ID_TID_CRID_UPDATE (-6)
+#define VCI_COLUMN_ID_TID_CRID_WRITE (-7)
+#define VCI_COLUMN_ID_TID_CRID_CDR (-8)
+#define VCI_COLUMN_ID_DATA_WOS (-9)
+#define VCI_COLUMN_ID_WHITEOUT_WOS (-10)

… are not “column-stored” – In other words, these ones, including the
"TID-CRID mapping table” (-5), are *not* indexed by CRID.

You may be right about a potential redundancy. But right now we're
focused on making these patches ready for open source - removing dead
code to shrink the size, improving the PostgreSQL core interface, and
fixing bugs. Rewriting or optimising the logic will have to wait.

> 5.
> Typo in README.
> - Each extent can have its own independent compression dictionary or all
> extents can share a comon dictionary
> --> s/comon/common/g
>

Fixed.

~~~

Please see the updated README that I attached in the previous post.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2025-07-14 08:54:38 Re: Changing shared_buffers without restart
Previous Message Dmitry Koval 2025-07-14 08:41:30 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands