BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: 3020001251(at)tju(dot)edu(dot)cn
Subject: BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` .
Date: 2026-06-18 07:54:52
Message-ID: 19525-b0be8e4eb7dbaf07@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 19525
Logged by: Yuelin Wang
Email address: 3020001251(at)tju(dot)edu(dot)cn
PostgreSQL version: 19beta1
Operating system: Linux (Ubuntu 24.04, x86_64)
Description:

**Component**: `contrib/dict_int/dict_int.c`, function `dintdict_lexize()`
(line 109)

Requires a `SQL_ASCII`-encoded database (to bypass null-byte encoding
checks) and superuser to install the extension and create a helper function
that passes a `bytea` token directly to the lexize callback. Once the
dictionary is created, any role granted `EXECUTE` on the helper can trigger
the crash.

```sql
-- 1. Create SQL_ASCII database (null bytes are not rejected)
CREATE DATABASE vuln_ascii ENCODING 'SQL_ASCII' TEMPLATE template0;
\c vuln_ascii

-- 2. Install extension and create an intdict dictionary with
REJECTLONG=false
CREATE EXTENSION dict_int;
CREATE TEXT SEARCH DICTIONARY intdict_test (
TEMPLATE = intdict_template,
MAXLEN = 8192,
REJECTLONG = false
);

-- 3. Create a C helper (raw_lexize.so) that invokes the lexize callback
with
-- a raw bytea token, bypassing the text encoding layer.
CREATE FUNCTION raw_lexize(dict regdictionary, token bytea)
RETURNS text[] AS 'raw_lexize', 'raw_lexize' LANGUAGE C STRICT;

-- 4. Trigger: null byte at position 0 causes pnstrdup to allocate 1 byte,
-- but txt[8192] = '\0' writes 8191 bytes past the end of the allocation.
SELECT raw_lexize('intdict_test',
decode('00' || repeat('78', 10000), 'hex'));
-- Server closes connection; ASan reports heap-buffer-overflow WRITE of size
1
-- at dict_int.c:109 in dintdict_lexize.
```

ASan confirmation (server killed the backend; connection dropped):
```
==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x525000052880
WRITE of size 1 at 0x525000052880 thread T0
#0 in dintdict_lexize
/data/ylwang/Projects/postgres/contrib/dict_int/dict_int.c:109
#1 in FunctionCall4Coll .../src/backend/utils/fmgr/fmgr.c:1215
#2 in raw_lexize /tmp/raw_lexize.c:37
SUMMARY: AddressSanitizer: heap-buffer-overflow
.../contrib/dict_int/dict_int.c:109 in dintdict_lexize
```

`pnstrdup(ptr, len)` uses `strnlen(ptr, len)` internally, so when the token
begins with a null byte it allocates only 1 byte. The variable `len` is not
updated to reflect this and retains the original token length, so the guard
at line 98 (`if (len > d->maxlen)`) passes, and line 109 writes `'\0'` at
offset `d->maxlen` (e.g., 8192) into a 1-byte allocation.

The fix is to recompute the effective length from the allocated buffer after
the `pnstrdup` call, for example by replacing the `if (len > d->maxlen)`
check with `if (strlen(txt) > d->maxlen)`. This ensures the truncation
offset is always within the bounds of what `pnstrdup` actually allocated.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2026-06-18 09:15:44 Re: BUG #19484: Segmentation fault triggered by FDW
Previous Message PG Bug reporting form 2026-06-18 07:52:50 BUG #19524: In `contrib/btree_gist` float4/float8 GiST index operations, handling NaN values with raw C operator