Re: Eliminating SPI / SQL from some RI triggers - take 3

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Evan Montgomery-Recht <montge(at)mianetworks(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3
Date: 2026-04-08 09:58:23
Message-ID: CA+HiwqFdkJGBU9QmYkm6056SuhunGk7yFCuVP=2vBejJa+qhGg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
> <montge(at)mianetworks(dot)net> wrote:
> > The patch also adds a test module (test_spi_func) with a C function
> > that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
> > crash cannot be triggered from PL/pgSQL. The test exercises the
> > C-level SPI INSERT with multiple FK constraints, FK violations, and
> > nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).

I applied only the test module changes and it passes (without
crashing) even without your proposed fix. It seems that's because the
C function in test_spi_func calling SPI is using the same resource
owner as the parent SELECT. I think you'd need to create a resource
owner manually in the spi_exec() C function to reproduce the crash, as
done in the attached 0001, which contains the src/test changes
extracted from your patch modified as described, including renaming
the C function to spi_exec_sql().

Also, the test cases that call spi_exec() (_sql()) directly from a
SELECT don't actually exercise the crash path because there is no
outer trigger-firing loop active. query_depth is 0 inside the inner
SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
callback there anyway. The critical case requires spi_exec_sql() to be
called from inside an AFTER trigger, where query_depth > 0 causes the
guard to defer the callback past the inner resource owner's lifetime.
I've added that test case. I kept your original test cases as they
still provide useful coverage of C-level SPI FK behavior even if they
don't exercise the crash path specifically. Maybe your original
PostGIS test suite that hit the crash did have the right structure,
but that's not reflected in the patch as far as I can tell.

I've also renamed the module to test_spi_resowner to better reflect
what it's about.

For the fix, I have a different proposal. As you observed, the
query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
that the nested SPI's callbacks get called under the outer resource
owner, which may not be the same as the one that SPI used. I think it
was a mistake to have that early return in the first place. Instead we
could remember for each callback what firing level it should be called
at, so the nested SPI's callbacks fire before returning to the parent
level and parent-level callbacks fire when the parent level completes.
I have implemented that in the attached 0002 along with transaction
boundary cleanup of callbacks, which passes the check-world for me,
but I'll need to stare some more at it before committing.

Let me know if this also fixes your own in-house test suite or if you
have any other suggestions or if you think I am missing something.

--
Thanks, Amit Langote

Attachment Content-Type Size
v2-0001-Modified-test-suite-from-Evan-s-patch.patch application/octet-stream 17.1 KB
v2-0002-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2026-04-08 10:09:38 Re: [PATCH] Refactor SLRU to always use long file names
Previous Message Nisha Moond 2026-04-08 09:45:10 Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?