From: | Japin Li <japinli(at)hotmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, "Aya Iwata (Fujitsu)" <iwata(dot)aya(at)fujitsu(dot)com>, Timur Magomedov <t(dot)magomedov(at)postgrespro(dot)ru>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date: | 2025-07-14 10:24:44 |
Message-ID: | ME0P300MB04455377D5C3926CE2B47423B654A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 14 Jul 2025 at 18:47, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Hi Japin,
>
> Thanks for your README questions.
>
> On Fri, Jul 11, 2025 at 7:18 PM Japin Li <japinli(at)hotmail(dot)com> wrote:
> ...
>>
>> 3.
>> In the README, 'TID' seems to have conflicting definitions:
>> Transaction ID (2.1) vs. tuple physical identifier (2.3.1).
>>
>> Could you confirm the intended meaning? Suggest using 'XID' for Transaction ID
>> if my understanding is correct.
>>
>
> Yes, TID was meant only for the Tuple identifier. Some terms became
> muddled. Hopefully, those are fixed now.
>
Thanks for your confirmation.
>> 4.
>> -1: TID relation (maps CRID to original TID)
>> -5: TID-CRID mapping table
>>
>> I'm trying to understand the distinctions here. Based on the definition in
>> vci_tidcrid.h, it seems plausible to use just one relation for the mapping,
>> suggesting a potential redundancy.
>>
>> /*
>> * TID-CRID pair used for TIDCRID update list
>> */
>> typedef struct vcis_tidcrid_pair_item
>> {
>> ItemPointerData page_item_id; /* TID on the original relation */
>> vcis_Crid crid; /* CRID */
>> } vcis_tidcrid_pair_item_t;
>>
>> How they are different? I see the code in vci_tidcrid.c
>>
>
> AFAIK, the distinction is described by the code comments in vci_columns.h:
>
> +/** Column ID of special column */
> +#define VCI_COLUMN_ID_TID (-1)
> +#define VCI_COLUMN_ID_NULL (-2)
> +#define VCI_COLUMN_ID_DELETE (-3)
>
> So those are all special columns in the ROS data part. In other words,
> these internal relations all have data that is indexed by the CRID –
> e.g “Delete vector” (2.3.3) and “Null information” (2.3.4). So here,
> the TID relation is the mapping from the CRID back to the original
> TID.
>
> On the other hand, the other relations...
>
> +/** The data below are not column-stored data.
> + * We prepare them for convenience.
> + */
> +#define VCI_COLUMN_ID_TID_CRID (-5)
> +#define VCI_COLUMN_ID_TID_CRID_UPDATE (-6)
> +#define VCI_COLUMN_ID_TID_CRID_WRITE (-7)
> +#define VCI_COLUMN_ID_TID_CRID_CDR (-8)
> +#define VCI_COLUMN_ID_DATA_WOS (-9)
> +#define VCI_COLUMN_ID_WHITEOUT_WOS (-10)
>
> … are not “column-stored” – In other words, these ones, including the
> "TID-CRID mapping table” (-5), are *not* indexed by CRID.
>
> You may be right about a potential redundancy. But right now we're
> focused on making these patches ready for open source - removing dead
> code to shrink the size, improving the PostgreSQL core interface, and
> fixing bugs. Rewriting or optimising the logic will have to wait.
>
>
Appreciate the detailed explanation! I'll dive deeper into it.
--
Regards,
Japin Li
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2025-07-14 10:45:53 | Re: 024_add_drop_pub.pl might fail due to deadlock |
Previous Message | John Naylor | 2025-07-14 10:22:38 | Re: Improving and extending int128.h to more of numeric.c |