Re: Inconsistent DB data in Streaming Replication

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>, "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Hannu Krosing'" <hannu(at)2ndquadrant(dot)com>, "'Sameer Thakur'" <samthakur74(at)gmail(dot)com>, "'Ants Aasma'" <ants(at)cybertec(dot)at>, <sthomas(at)optionshouse(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Samrat Revagade'" <revagade(dot)samrat(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inconsistent DB data in Streaming Replication
Date: 2013-04-17 10:49:10
Message-ID: 53401A75-FE8F-4E8A-B3D2-ED2BED113AC2@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr17, 2013, at 12:22 , Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> Do you mean to say that as an error has occurred, so it would not be able to
> flush received WAL, which could result in loss of WAL?
> I think even if error occurs, it will call flush in WalRcvDie(), before
> terminating WALReceiver.

Hm, true, but for that to prevent the problem the inner processing
loop needs to always read up to EOF before it exits and we attempt
to send a reply. Which I don't think it necessarily does. Assume,
that the master sends a chunk of data, waits a bit, and finally
sends the shutdown record and exits. The slave might then receive
the first chunk, and it might trigger sending a reply. At the time
the reply is sent, the master has already sent the shutdown record
and closed the connection, and we'll thus fail to reply and abort.
Since the shutdown record has never been read from the socket,
XLogWalRcvFlush won't flush it, and the slave ends up behind the
master.

Also, since XLogWalRcvProcessMsg responds to keep-alives messages,
we might also error out of the inner processing loop if the server
closes the socket after sending a keepalive but before we attempt
to respond.

Fixing this on the receive side alone seems quite messy and fragile.
So instead, I think we should let the master send a shutdown message
after it has sent everything it wants to send, and wait for the client
to acknowledge it before shutting down the socket.

If the client fails to respond, we could log a fat WARNING.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-04-17 12:13:06 Re: TODO links broken?
Previous Message Amit Kapila 2013-04-17 10:22:59 Re: Inconsistent DB data in Streaming Replication