Re: GIN pageinspect support for entry tree and posting tree

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN pageinspect support for entry tree and posting tree
Date: 2025-12-29 12:51:21
Message-ID: CALdSSPgpD5RfPn5qMbozU4_SQpZAbG3V_=KdxV9YaEG9gX=qEA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 14 Oct 2025 at 01:43, Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> Hi!
> I do frequently interact with gin indexes. I also frequently use
> pageinspect to check how data is stored in indixies. The latter is
> beneficial in both education and corruption fixing purposes.
>
> I was always wondering on why GIN pageinspect module lacks support for
> anything rather that Posting Tree leaf pages. So, I implemented
> extended GIN support.
>
> Patch includes two functions, gin_entrypage_items and gin_datapage_items.
>
> gin_entrypage_items is a function to work with entry tree. It supports
> both entry tree leaf and internal pages.
>
> GIN entry tree stores an index tuple with a single indexed column in
> single-index-column case.
> Otherwise, each tuple on GIN entry tree page is tuple of (attrnum,
> attrvalue), where the first item in tuple shows which indexes column
> this tuple refers to. GIN internal pages also contains downlink to
> other entry tree pages, while entry tree leaf page may contain
> compressed TIDs of the posting list, or downlink to posting tree.
>
>
> example of output on entry tree internal page of multi-column index:
>
> ```
>
> reshke=# select * from gin_entrypage_items(get_raw_page('x_i_j_idx',
> 1), 'x_i_j_idx'::regclass);
> itemoffset | downlink | tids | keys
> ------------+----------+------+------------------------------------
> 1 | (3,0) | {} | i=113
> 2 | (5,0) | {} | j=34173cb38f07f89ddbebc2ac9128303f
> 3 | (2,0) | {} | j=a0a080f42e6f13b3a2df133f073095dd
> 4 | (4,0) | {} | j=fc490ca45c00b1249bbe3554a4fdf6fb
> (4 rows)
>
> ```
>
> example of output on entry tree leaf page of multi-column index:
>
> ```
> reshke=# select * from gin_entrypage_items(get_raw_page('x_i_j_idx',
> 2), 'x_i_j_idx'::regclass);
> itemoffset | downlink | tids |
> keys
> ------------+----------------+------------------------------+------------------------------------
> 1 | (2147483696,3) | {"(1,39)","(1,40)","(2,1)"} |
> j=35f4a8d465e6e1edc05f3d8ab658c551
> 2 | (2147483696,3) | {"(4,10)","(4,11)","(4,12)"} |
> j=3636638817772e42b59d74cff571fbb3
> 3 | (2147483696,3) | {"(5,1)","(5,2)","(5,3)"} |
> j=3644a684f98ea8fe223c713b77189a77
> 4 | (2147483696,3) | {"(0,25)","(0,26)","(0,27)"} |
> j=37693cfc748049e45d87b8c7d8b9aacd
> 5 | (2147483696,3) | {"(3,33)","(3,34)","(3,35)"} |
> j=37a749d808e46495a8da1e5352d03cae
>
> ```
>
> downlink on the leaf page has a different meaning than on the internal
> page, but I didn't handle it any differently. In the example ouput,
> (2147483696,3) = ((1<<31) + 48, 3), meaning the next 48 bytes is the
> entry tree key, and after that there is 3 posting items,
> varbyte-encoded.
>
> I also tested this function on GIN index with nulls, it works. Let me
> know if I'm wrong, I know that GIN handles NULLS very differently, but
> I had trouble with it, which makes me think that I'm missing
> something.
>
> Also turns out this gin_entrypage_items actually works for fast lists
> (GIN_LIST and GIN_LIST_FULLROW pages). But output is something
> strange, I cannot validate is output is sane. For tids, I get values
> like
>
> ```
> 1 | (1,19) |
> {"(16782080,4096)","(16777216,14336)","(1280,0)","(469774336,0)","(65536,0)","(2734686208,57344)","(8389120,0)","(3372220424,65280)","(4294967295,65535)","(16711680,0)","(0,0)","(
> 33554688,0)","(614,0)","(67108864,1024)","(16777216,0)","(73793536,0)","(0,0)","(0,0)","(0,0)"}
> ```
>
>
>
>
> gin_datapage_items is for posting trees, but not leaf pages. For leaf
> pages, users are expected to still use the gin_leafpage_items
> function.
> Example output for gin_datapage_items:

In v2 I decided to simply reject these pages.

> ```
> reshke=# select * from gin_datapage_items(get_raw_page('x_i_j_idx',
> 43), 'x_i_j_idx'::regclass);
> itemoffset | downlink | item_tid
> ------------+----------+----------
> 1 | 124 | (162,12)
> 2 | 123 | (314,37)
> 3 | 251 | (467,23)
> 4 | 373 | (0,0)
> (4 rows)
>
> ```
>
> Patch still is very raw, many things to improve.
> Comments?
>
> --
> Best regards,
> Kirill Reshke

Attached v2 with minor fixes and new test cases in gin.sql.

--
Best regards,
Kirill Reshke

Attachment Content-Type Size
v2-0001-GIN-pageinspect-support-for-entry-tree-and-postin.patch application/octet-stream 16.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Mircea Cadariu 2025-12-29 12:45:03 Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication