| From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Cc: | 3020001251(at)tju(dot)edu(dot)cn |
| Subject: | BUG #19525: In `contrib/dict_int`, handling a token whose first byte is a null byte causes `pnstrdup()` . |
| Date: | 2026-06-18 07:54:52 |
| Message-ID: | 19525-b0be8e4eb7dbaf07@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 19525
Logged by: Yuelin Wang
Email address: 3020001251(at)tju(dot)edu(dot)cn
PostgreSQL version: 19beta1
Operating system: Linux (Ubuntu 24.04, x86_64)
Description:
**Component**: `contrib/dict_int/dict_int.c`, function `dintdict_lexize()`
(line 109)
Requires a `SQL_ASCII`-encoded database (to bypass null-byte encoding
checks) and superuser to install the extension and create a helper function
that passes a `bytea` token directly to the lexize callback. Once the
dictionary is created, any role granted `EXECUTE` on the helper can trigger
the crash.
```sql
-- 1. Create SQL_ASCII database (null bytes are not rejected)
CREATE DATABASE vuln_ascii ENCODING 'SQL_ASCII' TEMPLATE template0;
\c vuln_ascii
-- 2. Install extension and create an intdict dictionary with
REJECTLONG=false
CREATE EXTENSION dict_int;
CREATE TEXT SEARCH DICTIONARY intdict_test (
TEMPLATE = intdict_template,
MAXLEN = 8192,
REJECTLONG = false
);
-- 3. Create a C helper (raw_lexize.so) that invokes the lexize callback
with
-- a raw bytea token, bypassing the text encoding layer.
CREATE FUNCTION raw_lexize(dict regdictionary, token bytea)
RETURNS text[] AS 'raw_lexize', 'raw_lexize' LANGUAGE C STRICT;
-- 4. Trigger: null byte at position 0 causes pnstrdup to allocate 1 byte,
-- but txt[8192] = '\0' writes 8191 bytes past the end of the allocation.
SELECT raw_lexize('intdict_test',
decode('00' || repeat('78', 10000), 'hex'));
-- Server closes connection; ASan reports heap-buffer-overflow WRITE of size
1
-- at dict_int.c:109 in dintdict_lexize.
```
ASan confirmation (server killed the backend; connection dropped):
```
==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x525000052880
WRITE of size 1 at 0x525000052880 thread T0
#0 in dintdict_lexize
/data/ylwang/Projects/postgres/contrib/dict_int/dict_int.c:109
#1 in FunctionCall4Coll .../src/backend/utils/fmgr/fmgr.c:1215
#2 in raw_lexize /tmp/raw_lexize.c:37
SUMMARY: AddressSanitizer: heap-buffer-overflow
.../contrib/dict_int/dict_int.c:109 in dintdict_lexize
```
`pnstrdup(ptr, len)` uses `strnlen(ptr, len)` internally, so when the token
begins with a null byte it allocates only 1 byte. The variable `len` is not
updated to reflect this and retains the original token length, so the guard
at line 98 (`if (len > d->maxlen)`) passes, and line 109 writes `'\0'` at
offset `d->maxlen` (e.g., 8192) into a 1-byte allocation.
The fix is to recompute the effective length from the allocated buffer after
the `pnstrdup` call, for example by replacing the `if (len > d->maxlen)`
check with `if (strlen(txt) > d->maxlen)`. This ensures the truncation
offset is always within the bounds of what `pnstrdup` actually allocated.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Langote | 2026-06-18 09:15:44 | Re: BUG #19484: Segmentation fault triggered by FDW |
| Previous Message | PG Bug reporting form | 2026-06-18 07:52:50 | BUG #19524: In `contrib/btree_gist` float4/float8 GiST index operations, handling NaN values with raw C operator |