| From: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Subject: | pg_stash_advice: dump file with overlong stash name crashes worker in a restart loop |
| Date: | 2026-05-17 06:24:52 |
| Message-ID: | CAJTYsWWYhcEx1YqC=B331-Df9EpD8MxzwswWL0okz9LLCUUpBA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
While testing pg_stash_advice, I came across a
crash that's reachable through the persistence dump file:
If $PGDATA/pg_stash_advice.tsv contains a stash name of 64 bytes or more
(i.e. >= NAMEDATALEN), the worker hits an assertion in dshash_strhash()
on startup. The postmaster treats the SIGABRT as a crash, performs full
crash recovery, restarts the worker, and the worker hits the same
assertion again. The cluster never reaches a stable state until the
file is removed by hand.
SQL-level callers cannot reach this because pgsa_check_stash_name()
rejects long names. The persistence parser, however, accepts the name
straight from the file and feeds it into dshash_find_or_insert(), which
in turn calls dshash_strhash() configured with key_size = NAMEDATALEN.
That function asserts strlen(v) < size. Strictly speaking the file
parser was missing a length check that the SQL path already performs.
Minimal reproduction
====================
$ pg_ctl -D $PGDATA stop -m fast
$ python3 -c "print('stash\t' + 'a'*64)" > $PGDATA/pg_stash_advice.tsv
$ pg_ctl -D $PGDATA -l logfile start
Server log (excerpt, every ~1-2 seconds):
DEBUG: pg_stash_advice worker started
DEBUG: restoring advice stash "aaaa...aaaa"
TRAP: failed Assert("strlen((const char *) v) < size"),
File: "dshash.c", Line: 634
postgres: pg_stash_advice worker (ExceptionalCondition+0xbb)
postgres: pg_stash_advice worker (dshash_strhash+0x48)
postgres: pg_stash_advice worker (dshash_find_or_insert_extended+0x2e)
pg_stash_advice.so (pgsa_create_stash)
pg_stash_advice.so (pgsa_restore_stashes)
pg_stash_advice.so (pgsa_read_from_disk)
pg_stash_advice.so (pg_stash_advice_worker_main+0x1aa)
...
LOG: background worker "pg_stash_advice worker" was terminated by
signal 6: Aborted
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
[worker comes back up, asserts again, ad infinitum]
In my testing this produced ~18 crashes in ~12 seconds before I removed
the file. Threshold is exact: 63 bytes is fine, 64 bytes crashes.
In a production (non-assertion) build the name would instead be silently
truncated to NAMEDATALEN-1 bytes by the dshash key-copy path, allowing
distinct long names that share a 63-byte prefix to collide in the stash
table.
The attached patch validates the parsed stash-name length in the same
place where the parser already raises ERRCODE_DATA_CORRUPTED on other
forms of syntax error, mirroring the existing length check in
pgsa_check_stash_name(). Not sure if TAP test too is needed for this, but
I've included one in the patch.
Regards,
Ayush
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-pg_stash_advice-reject-overlong-stash-names-when-loa.patch | application/octet-stream | 4.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hüseyin Demir | 2026-05-17 09:10:36 | Re: log_checkpoints: count WAL segment creations from all processes |
| Previous Message | Andrei Lepikhov | 2026-05-17 06:03:15 | Re: Sequence Access Methods, round two |