Re: Inconsistent DB data in Streaming Replication

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, 'Fujii Masao' <masao(dot)fujii(at)gmail(dot)com>, 'Andres Freund' <andres(at)2ndquadrant(dot)com>, 'Hannu Krosing' <hannu(at)2ndquadrant(dot)com>, 'Sameer Thakur' <samthakur74(at)gmail(dot)com>, 'Ants Aasma' <ants(at)cybertec(dot)at>, sthomas(at)optionshouse(dot)com, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Samrat Revagade' <revagade(dot)samrat(at)gmail(dot)com>, 'PostgreSQL-development' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inconsistent DB data in Streaming Replication
Date: 2013-04-19 16:59:59
Message-ID: 1F024140-4EDE-47E1-A091-F1D962D70A98@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr19, 2013, at 14:46 , Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Wed, Apr 17, 2013 at 12:49:10PM +0200, Florian Pflug wrote:
>> Fixing this on the receive side alone seems quite messy and fragile.
>> So instead, I think we should let the master send a shutdown message
>> after it has sent everything it wants to send, and wait for the client
>> to acknowledge it before shutting down the socket.
>>
>> If the client fails to respond, we could log a fat WARNING.
>
> ISTM the master should half close the socket, using shutdown(). That
> way the client receives an EOF and can still then send its reply to the
> master. Then when the master receives that it can close() completely.

Hm, there may be arbitrarily many reply requests within the unread
data in the socket's buffer, so wait for just one reply won't work.
Also, to distinguish a slave which crashes while the master shuts down
from one that has received all WAL and flushed it, the slave should flush
all WAL and send a final reply before closing the socket.

So the master would, upon shutting down, close only its writing end
of the connection, and continue to receive replies until it sees EOF.
After all slaves have gone, the master would emit a WARNING for every
slave whose last logged flush position is earlier than the master's
idea of end-of-wal.

The slave would, upon seeing EOF, flush all its WAL, send a final
reply, and close() the socket.

I'm not sure that relying on TCP's half-close feature has much benefit
over using a home-grown shutdown message, though. Anyway, the basic
shutdown protocol would be the same regardless of what exactly we use
to signal a shutdown.

BTW, I assume we'd only do this for smart shutdowns.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-04-19 17:38:12 Re: elog() error, trying CURENT OF with foreign table
Previous Message Merlin Moncure 2013-04-19 16:32:13 Re: question about postgres storage management