| From: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | 王跃林 <violin0613(at)tju(dot)edu(dot)cn>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, 3764353996 <3764353996(at)qq(dot)com> |
| Subject: | Re: Fw:Re: Fw: gbt_var_consistent in contrib/btree_gist/btree_utils_var.c has internal-node type confusion on the <> strategy, bypassing exclusion constraints |
| Date: | 2026-07-02 15:43:30 |
| Message-ID: | CAJTYsWX1C5JLj92QdqEcArWs4XtXPog+yvyb6=X=_cghjXnkYQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
On Thu, 2 Jul 2026 at 20:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> =?UTF-8?B?546L6LeD5p6X?= <violin0613(at)tju(dot)edu(dot)cn> writes:
> > The analysis and fix look correct. The BtreeGistNotEqual branch is
> > the only strategy that bypasses the is_leaf check and passes
> > potentially truncated internal-node keys directly to f_eq, which is
> > unsafe for types like bit and varbit that require a minimum valid
> > header size. Returning true for internal nodes is the right
> > conservative choice and is consistent with how the other strategies
> > handle the internal-node case.
>
> None of this passes the smell test for me. If it's unsafe to
> call the type's f_eq function on a truncated key, how is it
> any safer to call the f_cmp function? IOW, why aren't *all* the
> cases in gbt_var_consistent() broken?
>
I think the difference is that for bit/varbit f_eq and f_cmp aren't the
same kind of function. f_eq is biteq (reads bit_len, does VARSIZE-8),
while f_cmp is byteacmp (plain byte-wise, only needs the varlena
header). On internal pages every other strategy goes through
f_cmp/byteacmp, and only the <> branch reaches f_eq. So I believe
that's why the byte-wise cases stay safe on a truncated key and <>
doesn't.
> It looks to me like gbt_var_node_truncate adjusts the length words
> of the truncated keys so that they are still valid, just shorter.
> Or at least it's trying to. That works fine for text and bytea,
> but it's not fine for bit/varbit because (a) it fails to update the
> "bit_len" field that follows the length word, and (b) the common
> prefix length selection logic doesn't know that it mustn't truncate
> away any part of the bit_len field. So my own thought about fixing
> this is that type bit needs a custom truncation method.
>
That was my first assumption too, but it looks like the node key has
no bit_len to preserve: gbt_bit_l2n/gbt_bit_xfrm seem to drop it and
keep only the raw bits before truncation runs. If so, a custom
truncation may not help, since the crash seems to come from <> calling
biteq rather than a damaged bit_len. Though I'm not fully sure I'm
reading the l2n step right.
> (I continue to regret that we ever accepted such underdocumented code.
> I think we ought to reverse-engineer a comment explaining what
> gbt_var_consistent is doing, eg, why are all of the tests seemingly
> reversed?)
>
Agreed, we can add a comment there.
Regards,
Ayush