Repeated requests for feedback in logical standby

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>
Subject: Repeated requests for feedback in logical standby
Date: 2016-04-12 15:18:32
Message-ID: CABUevExsjROqDcD0A2rnJ6HK6FuKGyewJr3PL12pw85BHFGS2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I've been running some tests, and come across a weird behavior on shutdown
with logical replication, 9.5.2. It seems that on shutdown, a lot of extra
requests for feedback are sent from the walsender.

On my laptop I get about 3-15 requests (different each time), and on a
server with a bit more power but still very low activity, the number is
simply huge.

Basically what happens is:

* Replication works fine
* I have *zero* activity on the master since it got started, not even a
single psql
* I issue a fast shutdown on the master
* Nothing happens at all for a while
* Once pg_recvlogical times out and sends a status message things wake up
* The server sends a whole bunch of requests for status, which
pg_recvlogical responds to
* Eventually things shut down (on the proper server, I've had to kill
things because it takes too long - I'm unsure if it ever shuts down)

Log output from pg_recvlogical:

$ ./pg_recvlogical -v -p 5495 -S trivialslot -o include_transaction=on -v
-d postgres --start -f - -P test_decoding 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'
[2016-04-12 17:14:31] pg_recvlogical: starting log streaming at 0/0 (slot
trivialslot)
[2016-04-12 17:14:31] pg_recvlogical: streaming initiated
[2016-04-12 17:14:31] pg_recvlogical: confirming write up to 0/0, flush to
0/0 (slot trivialslot)
[2016-04-12 17:14:41] pg_recvlogical: confirming write up to 0/18F9DE88,
flush to 0/18F9DE88 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DEF8,
flush to 0/18F9DEF8 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: disconnected; waiting 5 seconds to
try again

In this example, the server has:
2016-04-12 17:14:45 CEST LOG: received fast shutdown request
2016-04-12 17:14:45 CEST LOG: aborting any active transactions
2016-04-12 17:14:45 CEST LOG: autovacuum launcher shutting down
2016-04-12 17:14:45 CEST LOG: shutting down
2016-04-12 17:14:45 CEST LOG: database system is shut down

Note how it took a while (6 seconds) for pg_recvlogical to pick up that
anything was happening - looks like something might not have been flushed?

When running on the server where there's more data it was very clear though
- if I break the walsender with gdb, pg_recvlogical stops spitting out
debugging messages, and as soon as I "cont" in gdb it starts spitting them
out again, so there's a clear connection between the two. And it takes a
*lot* more than a few seconds.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-04-12 21:15:39 Re: BUG #14078: Excessive memory growth during nested loop in select
Previous Message Robert Haas 2016-04-12 13:00:57 Re: [HACKERS] Breakage with VACUUM ANALYSE + partitions