Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .

From: 王跃林 <violin0613(at)tju(dot)edu(dot)cn>
To: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .
Date: 2026-06-18 18:19:50
Message-ID: AG6A1wA9KnQT6LpEvIDG2Kpm.3.1781806790049.Hmail.3020001251@tju.edu.cn
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The fix looks correct. Recomputing len from the copy via strlen(txt) after pnstrdup() in dict_int directly addresses the root cause I reported. The dict_xsyn fix is also a clean approach since skipping the intermediate copy avoids the length mismatch entirely. Thank you for the patch!

王跃林
3020001251(at)tju(dot)edu(dot)cn

Original:
From:Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>Date:2026-06-18 22:41:32(中国 (GMT+08:00))To:3020001251<3020001251(at)tju(dot)edu(dot)cn> , pgsql-bugs<pgsql-bugs(at)lists(dot)postgresql(dot)org>Cc:Subject:Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .Hi,

On Thu, 18 Jun 2026 at 18:54, PG Bug reporting form <noreply(at)postgresql(dot)org> wrote:

The following bug has been logged on the website:

Bug reference: 19525
Logged by: Yuelin Wang
Email address: 3020001251(at)tju(dot)edu(dot)cn
PostgreSQL version: 19beta1
Operating system: Linux (Ubuntu 24.04, x86_64)
Description:

**Component**: `contrib/dict_int/dict_int.c`, function `dintdict_lexize()`
(line 109)

Requires a `SQL_ASCII`-encoded database (to bypass null-byte encoding
checks) and superuser to install the extension and create a helper function
that passes a `bytea` token directly to the lexize callback. Once the
dictionary is created, any role granted `EXECUTE` on the helper can trigger
the crash.

```sql
-- 1. Create SQL_ASCII database (null bytes are not rejected)
CREATE DATABASE vuln_ascii ENCODING 'SQL_ASCII' TEMPLATE template0;
\c vuln_ascii

-- 2. Install extension and create an intdict dictionary with
REJECTLONG=false
CREATE EXTENSION dict_int;
CREATE TEXT SEARCH DICTIONARY intdict_test (
TEMPLATE = intdict_template,
MAXLEN = 8192,
REJECTLONG = false
);

-- 3. Create a C helper (raw_lexize.so) that invokes the lexize callback
with
-- a raw bytea token, bypassing the text encoding layer.
CREATE FUNCTION raw_lexize(dict regdictionary, token bytea)
RETURNS text[] AS 'raw_lexize', 'raw_lexize' LANGUAGE C STRICT;

-- 4. Trigger: null byte at position 0 causes pnstrdup to allocate 1 byte,
-- but txt[8192] = '\0' writes 8191 bytes past the end of the allocation.
SELECT raw_lexize('intdict_test',
decode('00' || repeat('78', 10000), 'hex'));
-- Server closes connection; ASan reports heap-buffer-overflow WRITE of size
1
-- at dict_int.c:109 in dintdict_lexize.
```

ASan confirmation (server killed the backend; connection dropped):
```
==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x525000052880
WRITE of size 1 at 0x525000052880 thread T0
#0 in dintdict_lexize
/data/ylwang/Projects/postgres/contrib/dict_int/dict_int.c:109
#1 in FunctionCall4Coll .../src/backend/utils/fmgr/fmgr.c:1215
#2 in raw_lexize /tmp/raw_lexize.c:37
SUMMARY: AddressSanitizer: heap-buffer-overflow
.../contrib/dict_int/dict_int.c:109 in dintdict_lexize

Thanks for the report and repro!

`pnstrdup(ptr, len)` uses `strnlen(ptr, len)` internally, so when the token
begins with a null byte it allocates only 1 byte. The variable `len` is not
updated to reflect this and retains the original token length, so the guard
at line 98 (`if (len > d->maxlen)`) passes, and line 109 writes `'\0'` at
offset `d->maxlen` (e.g., 8192) into a 1-byte allocation.

The fix is to recompute the effective length from the allocated buffer after
the `pnstrdup` call, for example by replacing the `if (len > d->maxlen)`
check with `if (strlen(txt) > d->maxlen)`. This ensures the truncation
offset is always within the bounds of what `pnstrdup` actually allocated.

Your analysis seems right to me.

While looking around I think dict_xsyn may have a related issue: in
dxsyn_lexize() the token is copied with pnstrdup() and the original
length is then handed to str_tolower(), which reads that many bytes and
so could read past the shorter copy.

Attaching a patch that fixes both the above issues.

Regards,
Ayush

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2026-06-19 04:40:56 BUG #19528: Assert failure in generate_normalized_query() via Squashed Array Literals
Previous Message Ayush Tiwari 2026-06-18 14:41:32 Re: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .