Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .

From: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
To: 3020001251(at)tju(dot)edu(dot)cn, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .
Date: 2026-06-18 14:41:32
Message-ID: CAJTYsWW7+aVAFFV3dxg1s-RrtwioYPj1qQ9o2oAM40MReTXzAg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On Thu, 18 Jun 2026 at 18:54, PG Bug reporting form <noreply(at)postgresql(dot)org>
wrote:

> The following bug has been logged on the website:
>
> Bug reference: 19525
> Logged by: Yuelin Wang
> Email address: 3020001251(at)tju(dot)edu(dot)cn
> PostgreSQL version: 19beta1
> Operating system: Linux (Ubuntu 24.04, x86_64)
> Description:
>
> **Component**: `contrib/dict_int/dict_int.c`, function `dintdict_lexize()`
> (line 109)
>
> Requires a `SQL_ASCII`-encoded database (to bypass null-byte encoding
> checks) and superuser to install the extension and create a helper function
> that passes a `bytea` token directly to the lexize callback. Once the
> dictionary is created, any role granted `EXECUTE` on the helper can trigger
> the crash.
>
> ```sql
> -- 1. Create SQL_ASCII database (null bytes are not rejected)
> CREATE DATABASE vuln_ascii ENCODING 'SQL_ASCII' TEMPLATE template0;
> \c vuln_ascii
>
> -- 2. Install extension and create an intdict dictionary with
> REJECTLONG=false
> CREATE EXTENSION dict_int;
> CREATE TEXT SEARCH DICTIONARY intdict_test (
> TEMPLATE = intdict_template,
> MAXLEN = 8192,
> REJECTLONG = false
> );
>
> -- 3. Create a C helper (raw_lexize.so) that invokes the lexize callback
> with
> -- a raw bytea token, bypassing the text encoding layer.
> CREATE FUNCTION raw_lexize(dict regdictionary, token bytea)
> RETURNS text[] AS 'raw_lexize', 'raw_lexize' LANGUAGE C STRICT;
>
> -- 4. Trigger: null byte at position 0 causes pnstrdup to allocate 1 byte,
> -- but txt[8192] = '\0' writes 8191 bytes past the end of the
> allocation.
> SELECT raw_lexize('intdict_test',
> decode('00' || repeat('78', 10000), 'hex'));
> -- Server closes connection; ASan reports heap-buffer-overflow WRITE of
> size
> 1
> -- at dict_int.c:109 in dintdict_lexize.
> ```
>
> ASan confirmation (server killed the backend; connection dropped):
> ```
> ==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x525000052880
> WRITE of size 1 at 0x525000052880 thread T0
> #0 in dintdict_lexize
> /data/ylwang/Projects/postgres/contrib/dict_int/dict_int.c:109
> #1 in FunctionCall4Coll .../src/backend/utils/fmgr/fmgr.c:1215
> #2 in raw_lexize /tmp/raw_lexize.c:37
> SUMMARY: AddressSanitizer: heap-buffer-overflow
> .../contrib/dict_int/dict_int.c:109 in dintdict_lexize
>

Thanks for the report and repro!

`pnstrdup(ptr, len)` uses `strnlen(ptr, len)` internally, so when the token
> begins with a null byte it allocates only 1 byte. The variable `len` is not
> updated to reflect this and retains the original token length, so the guard
> at line 98 (`if (len > d->maxlen)`) passes, and line 109 writes `'\0'` at
> offset `d->maxlen` (e.g., 8192) into a 1-byte allocation.
>
> The fix is to recompute the effective length from the allocated buffer
> after
> the `pnstrdup` call, for example by replacing the `if (len > d->maxlen)`
> check with `if (strlen(txt) > d->maxlen)`. This ensures the truncation
> offset is always within the bounds of what `pnstrdup` actually allocated.
>

Your analysis seems right to me.

While looking around I think dict_xsyn may have a related issue: in
dxsyn_lexize() the token is copied with pnstrdup() and the original
length is then handed to str_tolower(), which reads that many bytes and
so could read past the shorter copy.

Attaching a patch that fixes both the above issues.

Regards,
Ayush

Attachment Content-Type Size
0001-Fix-out-of-bounds-access-on-embedded-null-tokens-in-.patch application/x-patch 2.5 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Previous Message Ayush Tiwari 2026-06-18 14:23:37 Re: BUG #19524: In `contrib/btree_gist` float4/float8 GiST index operations, handling NaN values with raw C operator