Re: walsender vs. XLogBackgroundFlush during shutdown

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: walsender vs. XLogBackgroundFlush during shutdown
Date: 2019-05-01 15:02:20
Message-ID: 20190501150220.z27mt2y6gt3nssdf@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 01, 2019 at 02:13:10PM +0200, Alexander Kukushkin wrote:
> Hi Tomas,
>
>On Wed, 1 May 2019 at 02:28, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>> I see there's an ongoing discussion about race conditions in walreceiver
>> blocking shutdown, so let me start a new thread about a race condition in
>> walsender during shutdown.
>>
>> The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck,
>> waiting for walsender processes to catch-up and terminate indefinitely.
>
>I can confirm, during the past couple of years we observed such a
>problem a few times and this is really annoying.
>
>> The reason for that is pretty simple - the walsenders are doing logical
>> decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
>> But per XLogSendLogical() this only happens if
>>
>> if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
>> {
>> WalSndCaughtUp = true;
>> ...
>> }
>
>After a couple of days investigating and debugging I came to a
>slightly different conclusion, WalSndCaughtUp is set to true in my
>case.
>Since I am mostly a python person, I decided to use psycopg2 for my
>investigation. I took an example from
>http://initd.org/psycopg/docs/advanced.html#logical-replication-quick-start
>as a starting point, created a slot and started the script.
>I wasn't running any transactions, therefore the DemoConsumer.__call__
>was never executed and cursor.send_feedback(flush_lsn=msg.data_start)
>was never called either. Basically, the only what the psycopg2
>internals was doing - periodically sending keepalive messages or
>replying to keepalives sent by postgres.

OK, so that seems like a separate issue, somewhat unrelated to the issue I
reported. And I'm not sure it's a walsender issue - it seems it might be a
psycopg2 issue in not reporting the flush properly, no?

>Actually, the same problem will happen even in the case when the
>consumer script receives some message, but not very intensively, but
>it is just a bit harder to reproduce it.
>
>If you attach to the walsender with gdb, you'll see the following picture:
>(gdb) bt
>#0 0x00007fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08,
>len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28
>#1 0x000055cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0,
>ptr=0x55cb958dca08, len=94) at be-secure.c:318
>#2 0x000055cb93aa7b87 in secure_write (port=0x55cb958d71e0,
>ptr=0x55cb958dca08, len=94) at be-secure.c:265
>#3 0x000055cb93ab6bf9 in internal_flush () at pqcomm.c:1433
>#4 0x000055cb93ab6b89 in socket_flush () at pqcomm.c:1409
>#5 0x000055cb93dac30b in send_message_to_frontend
>(edata=0x55cb942b4380 <errordata>) at elog.c:3317
>#6 0x000055cb93da8973 in EmitErrorReport () at elog.c:1481
>#7 0x000055cb93da5abf in errfinish (dummy=0) at elog.c:481
>#8 0x000055cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3
>"sending replication keepalive") at elog.c:1376
>#9 0x000055cb93bcae71 in WalSndKeepalive (requestReply=true) at
>walsender.c:3358

Is it stuck in the send() call forever, or did you happen to grab
this bracktrace?

>
>All above text probably looks like a brain dump, but I don't think
>that it conflicts with Tomas's findings it rather compliments them.
>I am very glad that now I know how to mitigate the problem on the
>client side, but IMHO it is also very important to fix the server
>behavior if it is ever possible.
>

I think having a report of an issue, with a way to reproduce it is a first
(and quite important) step. So thanks for doing that.

That being said, I think those are two separate issues, with different
causes and likely different fixes. I don't think fixing the xlog flush
will resolve your issue, and vice versa.

FWIW attached is a patch that I used to reliably trigger the xlog flush
issue - it simply adds LogLogicalMessage() to commit handler in the
built-in output plugin. So all you need to do is create a subscription,
start generating commit and trigger a restart. The message is 8kB, so it's
definitely long enough to overflow to the next xlog page.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
walsender-hang.patch text/plain 914 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-05-01 15:04:53 Re: to_timestamp docs
Previous Message Tom Lane 2019-05-01 14:01:50 Re: to_timestamp docs