From: | proels(at)web(dot)de |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #9420: WAL-Sender repeatedly killed |
Date: | 2014-03-03 12:53:42 |
Message-ID: | 20140303125342.2159.79464@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 9420
Logged by: Stefan Pröls
Email address: proels(at)web(dot)de
PostgreSQL version: 9.2.7
Operating system: opensuse 11.4
Description:
I'm unsure whether this is really a bug, but at least the log messages don't
look like expected behavior.
We are running async. streaming replication between 2 PostgreSQL 9.2.7
servers. Last night the slave failed to reboot nicely because of a
filesystem inconsistency which required remote hands. So there was a delay
of about 30 hours until the slave was up an running again. Thus, it had to
catch up quite some replication delay. It did succeed doing so -- however,
according to the log messages the WAL sender process was killed several
times during this period. From the master's log:
2014-03-03 13:02:30 CET [unbekannt] standby LOG: breche WAL-Sender-Prozess
ab wegen Zeitüberschreitung bei der Replikation
2014-03-03 13:05:08 CET [unbekannt] standby LOG: breche WAL-Sender-Prozess
ab wegen Zeitüberschreitung bei der Replikation
[several more times]
>From the slave's log:
2014-03-03 12:47:18 CET FATAL: konnte keine Daten an den WAL-Stream
senden: Server beendete die Verbindung unerwartet
Das heißt wahrscheinlich, dass der Server abnormal beendete
bevor oder während die Anweisung bearbeitet wurde.
2014-03-03 13:00:34 CET LOG: ungültige magische Zahl 0000 in Logdatei 17,
Segment 111, Offset 3538944
2014-03-03 13:00:35 CET LOG: Streaming-Replikation hat erfolgreich mit
primärem Server verbunden
2014-03-03 13:03:07 CET FATAL: konnte keine Daten an den WAL-Stream
senden: Server beendete die Verbindung unerwartet
Das heißt wahrscheinlich, dass der Server abnormal beendete
bevor oder während die Anweisung bearbeitet wurde.
2014-03-03 13:04:04 CET LOG: ungültige magische Zahl 0000 in Logdatei 17,
Segment 185, Offset 5111808
2014-03-03 13:04:04 CET LOG: Streaming-Replikation hat erfolgreich mit
primärem Server verbunden
[also repeated serverali more times]
It appears the slave had trouble keeping up with the amount of data sent by
the master and, therefore, the wal-sender was repeatedly killed until the
slave finally catched up. I don't know where the "invalid magic number 0000"
comes from, though.
Note that the master has much faster disks (SSDs) than the slave
(SATA-HDDs), so some congestion on the master is likely to arise when the
master streams out more data than the slave can keep up with.
From | Date | Subject | |
---|---|---|---|
Next Message | muhammadk | 2014-03-03 13:52:15 | BUG #9422: PostgreSQL Zero downtime during deployement |
Previous Message | Sandro Santilli | 2014-03-03 08:31:09 | Re: uninterruptable regexp_replace in 9.2 and 9.3 |