Re: PG19 FK fast path: OOB write and missed FK checks during batched

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin(at)acm(dot)org>, Kirk Wolak <wolakk(at)gmail(dot)com>
Subject: Re: PG19 FK fast path: OOB write and missed FK checks during batched
Date: 2026-06-11 08:18:15
Message-ID: CAEG8a3JhZwvdeedTvQCvvXgVkeb6ENW2haiHfxnXkXbv3NMmjg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit,

On Wed, Jun 10, 2026 at 8:17 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> Hi Ayush,
>
> Thanks for the review.
>
> On Wed, Jun 10, 2026 at 7:09 PM Ayush Tiwari
> <ayushtiwari(dot)slg01(at)gmail(dot)com> wrote:
> > On Wed, 10 Jun 2026 at 14:02, Amit Langote <amitlangote09(at)gmail(dot)com> wrote
> >> Thanks for checking. I will review them a bit more closely before
> >> committing by Friday. Other reviews are welcome.
> >
> > Thanks for the patch!
> >
> > I read through v1-0001 and v1-0002 and tried them locally. I had a couple of
> > things I wanted to ask about.
> >
> > 1. The per-entry "flushing" flag and test coverage. If I'm reading the two
> > patches together correctly, with both applied the 64-row re-entry test in 0001
> > reaches the flush through ri_FastPathEndBatch(), where 0002's cache-wide
> > ri_fastpath_flushing guard already routes the re-entrant check to the per-row
> > path before it gets back into ri_FastPathBatchAdd(). Does that mean the
> > per-entry flag from 0001 isn't really exercised by that test once 0002 is in?
> > As far as I can tell you'd need the flush to fire from ri_FastPathBatchAdd()
> > itself (a 65th row) to reach it. I tried a 65-row variant (same FK, re-entrant
> > DML from the cast during the full-batch flush), including a case where the
> > re-entrant row was an orphan, and it seemed to do the right thing; the
> > per-row fallback still raised the violation. Would it be worth switching the
> > test to 65 rows, or adding that variant, so the per-entry guard is covered too?
> > Or am I missing a path where the committed test already hits it?
>
> You're right. With 0002 applied, the 64-row test reaches the flush
> through ri_FastPathEndBatch(), where the cache-wide
> ri_fastpath_flushing guard catches the re-entry before it returns to
> ri_FastPathBatchAdd(), so the per-entry flag is no longer exercised by
> that test. To hit the per-entry flag the flush has to fire from
> ri_FastPathBatchAdd() itself, which the 64-row case no longer does
> once the add and flush are reordered.
>
> Rather than bump the test to 65 rows, I'd prefer to keep the flush
> firing from ri_FastPathBatchAdd() at 64 by not reordering the add and
> flush, and prevent the OOB write by bounds-checking the write instead,
> as done in the attached updated 0001. A re-entrant add then can't
> overrun the array regardless of the flag, the per-entry flushing guard
> still routes the re-entry to the per-row path, and a 64-row statement
> flushes from ri_FastPathBatchAdd() on the 64th row, so the existing
> test exercises the per-entry guard.
>
> > 2. Resetting ri_fastpath_flushing. I noticed it's cleared only in the
> > PG_FINALLY of ri_FastPathEndBatch(), which does seem to cover the cases I could
> > think of. Since ri_FastPathXactCallback already NULLs ri_fastpath_cache and
> > clears ri_fastpath_callback_registered at transaction end, I wondered whether
> > it might be worth clearing ri_fastpath_flushing there too, just as cheap
> > insurance against some future path that leaves it set across transactions
> > though maybe that's unnecessary given the PG_FINALLY.
>
> Agreed, it's cheap and matches the existing resets there, so I've
> added it to ri_FastPathXactCallback() in v2-0002.
>
> > Other than the above queries, the patch looks good to me.
>
> Updated patches attached.

I only reviewed and applied patch 0001 on my local machine, and it
successfully fixed the crash.

One minor comment:

+ if (fpentry->flushing)
+ {
+ ri_FastPathCheck(riinfo, fk_rel, newslot);
+ return;
+ }

Would it be worth wrapping the condition with unlikely()? It seems
this branch is expected to be false in most cases, not a strong
opinion though.

>
> --
> Thanks, Amit Langote

--
Regards
Junwang Zhao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2026-06-11 08:28:53 Re: Commit Sequence Numbers and Visibility
Previous Message Ashutosh Sharma 2026-06-11 08:17:15 Re: Report bytes and transactions actually sent downtream