Re: 001_rep_changes.pl stalls

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 001_rep_changes.pl stalls
Date: 2020-04-20 05:30:08
Message-ID: dcdb33ba-4bcc-d41e-4aad-23207e195ccf@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/04/18 16:01, Noah Misch wrote:
> On Fri, Apr 17, 2020 at 05:06:29PM +0900, Kyotaro Horiguchi wrote:
>> At Fri, 17 Apr 2020 17:00:15 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
>>> By the way, if latch is consumed in WalSndLoop, succeeding call to
>>> WalSndWaitForWal cannot be woke-up by the latch-set. Doesn't that
>>> cause missing wakeups? (in other words, overlooking of wakeup latch).
>>
>> - Since the only source other than timeout of walsender wakeup is latch,
>> - we should avoid wasteful consuming of latch. (It is the same issue
>> - with [1]).
>>
>> + Since walsender is wokeup by LSN advancement via latch, we should
>> + avoid wasteful consuming of latch. (It is the same issue with [1]).
>>
>>
>>> If wakeup signal is not remembered on walsender (like
>>> InterruptPending), WalSndPhysical cannot enter a sleep with
>>> confidence.
>
> No; per latch.h, "What must be avoided is placing any checks for asynchronous
> events after WaitLatch and before ResetLatch, as that creates a race
> condition." In other words, the thing to avoid is calling ResetLatch()
> without next examining all pending work that a latch would signal. Each
> walsender.c WaitLatch call does follow the rules.
>
> On Sat, Apr 18, 2020 at 12:29:58AM +0900, Fujii Masao wrote:
>> On 2020/04/17 14:41, Noah Misch wrote:
>>> 1. Make XLogSendPhysical() more like XLogSendLogical(), calling
>>> WalSndWaitForWal() when no WAL is available. A quick version of this
>>> passes tests, but I'll need to audit WalSndWaitForWal() for things that are
>>> wrong for physical replication.
>>
>> (1) makes even physical replication walsender sleep in two places and
>> which seems to make the code for physical replication complicated
>> more than necessary. I'd like to avoid (1) if possible.
>
> Good point.
>
>>> 2. Make XLogSendLogical() more like XLogSendPhysical(), returning when
>>> insufficient WAL is available. This complicates the xlogreader.h API to
>>> pass back "wait for this XLogRecPtr", and we'd then persist enough state to
>>> resume decoding. This doesn't have any advantages to make up for those.
>>>
>>> 3. Don't avoid waiting in WalSndLoop(); instead, fix the stall by copying the
>>> WalSndKeepalive() call from WalSndWaitForWal() to WalSndLoop(). This risks
>>> further drift between the two wait sites; on the other hand, one could
>>> refactor later to help avoid that.
>>
>> Since the additional call of WalSndKeepalive() is necessary only for
>> logical replication, it should be copied to, e.g., XLogSendLogical(),
>> instead of WalSndLoop()? For example, when XLogSendLogical() sets
>> WalSndCaughtUp to true, it should call WalSndKeepalive()?
>
> We'd send a keepalive even when pq_flush_if_writable() can't empty the output
> buffer. That could be acceptable, but it's not ideal.
>
>> The root problem seems that when WAL record that's no-opes for
>> logical rep is processed, keep alive message has not sent immediately,
>> in spite of that we want pg_stat_replication to be updated promptly.
>
> The degree of promptness should be predictable, at least. If we removed the
> WalSndKeepalive() from WalSndWaitForWal(), pg_stat_replication updates would
> not be prompt, but they would be predictable. I do, however, think prompt
> updates are worthwhile.
>
>> (3) seems to try to address this problem straightly and looks better to me.
>>
>>> 4. Keep the WalSndLoop() wait, but condition it on !logical. This is the
>>> minimal fix, but it crudely punches through the abstraction between
>>> WalSndLoop() and its WalSndSendDataCallback.
>>
>> (4) also looks good because it's simple, if we can redesign those
>> functions in good shape.
>
> Let's do that. I'm attaching the replacement implementation and the revert of
> v1.

Thanks for the patch! Though referencing XLogSendLogical inside WalSndLoop()
might be a bit ugly,, I'm fine with this change because it's simple and easier
to understand.

+ * Block if we have unsent data. XXX For logical replication, let
+ * WalSndWaitForWal(), handle any other blocking; idle receivers need
+ * its additional actions. For physical replication, also block if
+ * caught up; its send_data does not block.

It might be better to s/WalSndWaitForWal()/send_data()? Because not only
WalSndWaitForWal() but also WalSndWriteData() seems to handle the blocking.
WalSndWriteData() is called also under send_data, i.e., XLogSendLogical().

frame #2: 0x0000000106bcfa84 postgres`WalSndWriteData(ctx=0x00007fb2a4812d20, lsn=22608080, xid=488, last_write=false) at walsender.c:1247:2
frame #3: 0x0000000106b98295 postgres`OutputPluginWrite(ctx=0x00007fb2a4812d20, last_write=false) at logical.c:540:2
frame #4: 0x00000001073fe9b8 pgoutput.so`send_relation_and_attrs(relation=0x00000001073ba2c0, ctx=0x00007fb2a4812d20) at pgoutput.c:353:2
frame #5: 0x00000001073fe7a0 pgoutput.so`maybe_send_schema(ctx=0x00007fb2a4812d20, relation=0x00000001073ba2c0, relentry=0x00007fb2a483aa60) at pgoutput.c:315:2
frame #6: 0x00000001073fd4c0 pgoutput.so`pgoutput_change(ctx=0x00007fb2a4812d20, txn=0x00007fb2a502e428, relation=0x00000001073ba2c0, change=0x00007fb2a5030428) at pgoutput.c:394:2
frame #7: 0x0000000106b99094 postgres`change_cb_wrapper(cache=0x00007fb2a482ed20, txn=0x00007fb2a502e428, relation=0x00000001073ba2c0, change=0x00007fb2a5030428) at logical.c:753:2
frame #8: 0x0000000106ba2200 postgres`ReorderBufferCommit(rb=0x00007fb2a482ed20, xid=488, commit_lsn=22621088, end_lsn=22621136, commit_time=640675460323211, origin_id=0, origin_lsn=0) at reorderbuffer.c:1653:7
frame #9: 0x0000000106b93c10 postgres`DecodeCommit(ctx=0x00007fb2a4812d20, buf=0x00007ffee954c0f8, parsed=0x00007ffee954bf90, xid=488) at decode.c:637:2
frame #10: 0x0000000106b92fa9 postgres`DecodeXactOp(ctx=0x00007fb2a4812d20, buf=0x00007ffee954c0f8) at decode.c:245:5
frame #11: 0x0000000106b92aee postgres`LogicalDecodingProcessRecord(ctx=0x00007fb2a4812d20, record=0x00007fb2a4812fe0) at decode.c:114:4
frame #12: 0x0000000106bd2a16 postgres`XLogSendLogical at walsender.c:2863:3

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-04-20 05:35:23 Re: Race condition in SyncRepGetSyncStandbysPriority
Previous Message Masahiko Sawada 2020-04-20 05:27:27 Re: While restoring -getting error if dump contain sql statements generated from generated.sql file