| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Cc: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shvetamalik(at)gmail(dot)com> |
| Subject: | RE: [PATCH] Release replication slot on error in SQL-callable slot functions |
| Date: | 2026-05-27 11:46:51 |
| Message-ID: | TY4PR01MB17718F04D32E0073F0BC7EC0294082@TY4PR01MB17718.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wednesday, May 27, 2026 7:00 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, May 27, 2026 at 1:42 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> >
> > On Wed, May 27, 2026 at 1:31 PM SATYANARAYANA NARLAPURAM
> > <satyanarlapuram(at)gmail(dot)com> wrote:
> > > Thank you for the changes and review.
> >
> > Could pg_create_physical_replication_slot() still have the same issue
> > if it throws an error after ReplicationSlotCreate() and that error is
> > caught by a PL/pgSQL EXCEPTION block?
> >
> > Also, do maybe pg_copy_physical_replication_slot(),
> > pg_drop_replication_slot(), and ALTER_REPLICATION_SLOT potentially have
> the same issue as well?
> >
>
> pg_copy_physical_replication_slot() should not have it as the common
> 'copy_replication_slot' is already fixed in the patch. I will review the others.
I have one slight concern about the approach of releasing the slot within a
PG_CATCH() block in lots of functions. I'm not entirely sure if it's safe or
acceptable to do so before aborting the current transaction, so just to confirm
it once:
Since both ReplicationSlotRelease() and ReplicationSlotDropPtr() acquire
LWLocks, it's possible that a backend reports an ERROR while already holding
one of these locks, then enters the PG_CATCH() block and calls
ReplicationSlotRelease(), which attempts to acquire the same LWLock. However,
LWLocks do not distinguish between locks held by the same backend versus other
backends, so the backend could block forever and become uninterruptible.
I don't have a better alternative, but I think we can
evaluate once whether this is a real risk and if it's acceptable (perhaps the
scenario is rare enough to be acceptable). It may also be worth adding comments
to document this risk.
Best Regards,
Hou zj
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhijie Hou (Fujitsu) | 2026-05-27 11:50:16 | Fix race in ReplicationSlotRelease for ephemeral slots |
| Previous Message | Jakub Wartak | 2026-05-27 11:39:15 | log_postmaster_stats |