| From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
|---|---|
| To: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, kyzevan23(at)mail(dot)ru, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19488: Standby connection fails after dropping on login event trigger enabled always |
| Date: | 2026-05-21 11:31:44 |
| Message-ID: | CAPpHfdszpu=qOtNBgeF=ijJLW3ODUA6YuJbKq6c0Do7R-CWXHg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Thu, May 21, 2026 at 11:10 AM Ayush Tiwari
<ayushtiwari(dot)slg01(at)gmail(dot)com> wrote:
> On Wed, 20 May 2026 at 18:15, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> On Wed, 20 May 2026 at 17:35, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Wed, May 20, 2026 at 8:31 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>>> >
>>> > On Wed, May 20, 2026 at 1:03 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> > > On Wed, May 20, 2026 at 1:37 PM Ayush Tiwari
>>> > > <ayushtiwari(dot)slg01(at)gmail(dot)com> wrote:
>>> > > > Thanks for the report and the precise repro.
>>> > >
>>> > > +1
>>> > >
>>> > > > Attached patch adds a RecoveryInProgress() check to skip the cleanup
>>> > > > branch on standbys.
>>> > >
>>> > > Thanks for investigating this issue and for the patch!
>>> > > The patch looks good to me.
>>> > >
>>> > > > I think this needs to be backpatched too.
>>> > >
>>> > > Yes. Seems this should be backpatched to v17, where login event triggers
>>> > > were introduced.
>>> >
>>> > I've added a tap test reproducing the bug. I'm going to push and
>>> > backpatch this to v17 if no objections.
>>>
>>> +# Wait for the standby to replay the CREATE/DROP catalog state. At
>>> +# this point the standby's pg_database.dathasloginevt is still true.
>>> +$primary->wait_for_replay_catchup($standby);
>>> +
>>> +# A new connection to the standby exercises EventTriggerOnLogin()'s
>>> +# cleanup branch. With the RecoveryInProgress() guard, that branch is
>>> +# skipped on the standby and the connection succeeds. Without it the
>>> +# session aborts with a FATAL about AccessExclusiveLock. Probing the
>>> +# flag itself via safe_psql is what triggers the cleanup path.
>>> +is( $standby->safe_psql(
>>> + 'postgres',
>>> + "SELECT dathasloginevt FROM pg_database WHERE datname = 'postgres'"),
>>> + 't',
>>> + 'standby accepts connection and reports dangling dathasloginevt');
>>>
>>> The test looks unstable to me. wait_for_replay_catchup() may connect to
>>> the primary to obtain the flush LSN, which could cause dathasloginevt to
>>> become false before the subsequent safe_psql() call on the standby.
>
>
> I had registered this in commitfest, could see CI bot failing
> https://commitfest.postgresql.org/patch/6790/
>
>> my $drop_lsn = $primary->safe_psql(
>> 'postgres', q{
>> BEGIN;
>> DROP EVENT TRIGGER init_session;
>> DROP FUNCTION init_session();
>> COMMIT;
>> SELECT pg_current_wal_lsn();
>> });
>>
>> $primary->wait_for_catchup($standby, 'replay', $drop_lsn);
>
>
> Attaching v3 with this change on top of Alexander's changes.
I suggest another approach. Create a separate test database and apply
event trigger on it. wait_for_catchup() and others use 'postgres'
database and wouldn't touch our test database.
I also added check for successful clearance of the flag on both
primary and standby. One issue spotted there: in-place heap update
doesn't issue a WAL flush. But I think that's minor, WAL could be
flushed by any subsequent operation.
------
Regards,
Alexander Korotkov
Supabase
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-Skip-pg_database.dathasloginevt-cleanup-on-standb.patch | application/octet-stream | 9.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ayush Tiwari | 2026-05-21 11:48:34 | Re: BUG #19488: Standby connection fails after dropping on login event trigger enabled always |
| Previous Message | Radim Marek | 2026-05-21 09:06:18 | Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 |