RE: walsender performance regression due to logical decoding on standby changes

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "bharath(dot)rupireddyforpostgres(at)gmail(dot)com" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "bertranddrouvot(dot)pg(at)gmail(dot)com" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql(at)j-davis(dot)com" <pgsql(at)j-davis(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: RE: walsender performance regression due to logical decoding on standby changes
Date: 2023-05-24 05:53:51
Message-ID: OS0PR01MB57164219C1D38CB3E0D8ACE094419@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, May 23, 2023 1:53 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-05-22 12:15:07 +0000, Zhijie Hou (Fujitsu) wrote:
> > About "a backend doing logical decoding", do you mean the case when a
> user
> > start a backend and invoke pg_logical_slot_get_changes() to do the logical
> > decoding ? If so, it seems the logical decoding in a backend won't be waked
> up
> > by startup process because the backend won't be registered as a walsender
> so
> > the backend won't be found in WalSndWakeup().
>
> I meant logical decoding happening inside a walsender instance.
>
>
> > Or do you mean the deadlock between the real logical walsender and startup
> > process ? (I might miss something) I think the logical decoding doesn't lock
> > the target user relation when decoding because it normally can get the
> needed
> > information from WAL.
>
> It does lock catalog tables briefly. There's no guarantee that such locks are
> released immediately. I forgot the details, but IIRC there's some outfuncs
> (enum?) that intentionally delay releasing locks till transaction commit.

Thanks for the explanation !

I understand that the startup process can take lock on the catalog(when
replaying record) which may conflict with the lock in walsender.

But in walsender, I think we only start transaction after entering
ReorderBufferProcessTXN(), and the transaction started here will be released
soon after processing and outputting the decoded transaction's data(as the
comment in ReorderBufferProcessTXN() says:" all locks acquired in here to be
released, not reassigned to the parent and we do not want any database access
have persistent effects.").

Besides, during the process and output of the decoded transaction, the
walsender won't wait for the wakeup of startup process(e.g.
WalSndWaitForWal()), it only waits if the data is being sent to subscriber. So
it seems the lock conflict here won't cause the deadlock for now, although it
may have a risk if we change this logic later. Sorry if I missed something, and
thanks again for your patience in explaining.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Benjamin Coutu 2023-05-24 05:59:45 Re: Insertion Sort Improvements
Previous Message Richard Guo 2023-05-24 05:53:50 Re: ERROR: no relation entry for relid 6