Re: PG19 FK fast path: OOB write and missed FK checks during batched

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Nikolay Samokhvalov <nik(at)postgres(dot)ai>
Cc: pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin(at)acm(dot)org>, Kirk Wolak <wolakk(at)gmail(dot)com>
Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched
Date: 2026-06-09 13:31:01
Message-ID: CA+HiwqHUz50YqJn4XiNsSLN2c+9eYBy1af=y_dfdJTsz5BmbJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 8, 2026 at 5:18 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Sat, Jun 6, 2026 at 6:13 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > Thanks for the detailed report and reproducers. I’ve started looking into this.
>
> Continuing to look. Appended this to the open items list:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_19_Open_Items#Open_Issues

Thanks again, Nik, for the thorough analysis and the reproducers --
they made all three easy to confirm and pin down. Patches attached:
0001 for defect 1, 0002 for defects 2 and 3.

0001 (defect 1): check and flush before writing the row rather than
after, and add a per-entry "flushing" flag so a re-entrant add on the
same entry during a flush takes the per-row path instead of touching
the mid-flush batch. The flag is cleared in a PG_FINALLY, which also
resets batch_count, so the entry stays reusable if a flush error is
caught by a savepoint.

0002 (defects 2 and 3): rather than track subxact membership per row,
confine batching to the top transaction level -- in RI_FKey_check,
when GetCurrentTransactionNestLevel() > 1, use the per-row path. I
went this way because per-entry subxact tracking isn't enough (one
entry's batch can mix rows from several levels, since the cache is
keyed by constraint), and flushing at subxact boundaries doesn't work
for deferred constraints. Once the cache only ever holds top-level
rows, a subxact abort has nothing of its own to discard, so
ri_FastPathSubXactCallback goes away -- that's what fixes your defect
2 reproducer. For defect 3, which is still reachable at the top level,
the same patch adds a cache-wide flag set while ri_FastPathEndBatch
iterates, so a re-entrant check during the scan takes the per-row path
instead of inserting into the cache being scanned.

The per-row path still bypasses SPI, so these stay well ahead of the
pre-19 check in terms of performance. I'd like to recover batching
across subtransactions properly in v20 but didn't want to rush it now.

On defect 3, can you check whether your reproducer still commits the
orphan with 0002 applied, or whether (like on my build) it now raises
the violation? I'd like to be sure the bucket-placement variation you
hit is actually covered. And of course any review of the patches is
welcome.

--
Thanks, Amit Langote

Attachment Content-Type Size
v1-0001-Fix-out-of-bounds-write-in-RI-fast-path-batch-on-.patch application/octet-stream 10.0 KB
v1-0002-Confine-RI-fast-path-batching-to-the-top-transact.patch application/octet-stream 10.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-06-09 14:24:36 Re: Reject negative max_retention_duration values
Previous Message Peter Eisentraut 2026-06-09 13:16:32 Re: [PATCH v4] pg_stat_statements: Add last_execution_start column