Re: Synchronization levels in SR

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronization levels in SR
Date: 2010-05-27 09:30:55
Message-ID: 1274952655.6203.4051.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2010-05-27 at 16:35 +0900, Fujii Masao wrote:
> On Thu, May 27, 2010 at 3:21 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
> >> On Wed, May 26, 2010 at 10:20 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> > On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
> >> >
> >> >> I guess that dropping the support of #3 doesn't reduce complexity
> >> >> since the code of #3 is almost the same as that of #2. Like
> >> >> walreceiver sends the ACK after receiving the WAL in #2 case, it has
> >> >> only to do the same thing after the WAL flush.
> >> >
> >> > Hmm, well the code for #3 is similar also to the code for #4. So if you
> >> > do #2, its easy to do #2, #3 and #4 together.
> >>
> >> No. #4 requires the way of prompt communication between walreceiver and
> >> startup process, but #2 and #3 not. That is, in #4, walreceiver has to
> >> wake the startup process up as soon as it has flushed WAL. OTOH, the
> >> startup process has to wake walreceiver up as soon as it has replayed
> >> WAL, to request it to send the ACK to the master. In #2 and #3, the
> >> prompt communication from walreceiver to startup process, i.e., changing
> >> the poll loop in the startup process would also be useful for the data
> >> to be visible immediately on the standby. But it's not required.
> >
> > You need to pass WAL promptly on primary from backend to WALSender.
> > Whatever mechanism you use can also be reused symmetrically on standby
> > to provide #4. So not a problem.
>
> I cannot be so optimistic since the situation differs from one process
> to another.

This spurs some architectural thinking:

I think we need to disconnect the idea of waiting in any of the
components. Anytime we ask WALSender or WALReceiver to wait for
acknowledgement we will be reducing throughput. So we should assume that
they will continue to work as quickly as possible.

The acknowledgement from standby can contain the latest xlog location of
WAL received, WAL written to disk and WAL applied, all by reading values
from shared memory. It's all the same, whether we send back 2 or 3 xlog
locations in the ack message.

Who sends the ack message? Who receives it? Would it be easier to have
this happen in a second pair of processes WALSynchroniser (on primary)
and WAL Acknowledger (on standby). WALAcknowledger would send back a
stream of ack messages with latest xlog positions. WALSynchroniser would
receive these messages and wake up sleeping backends. If we did that
then there'd be almost no change at all to existing code, just
additional code and processes for the sync case. Code would be separate
and there would be no performance concerns either.

Backends can then choose to wait until the xlog location they wish has
been achieved which might be in the next acknowledgement message or in a
subsequent one. That also ensures that the logic for this is completely
on the master and the standby doesn't act differently, apart from
needing to start a WALAcknowledger process if sync rep is requested.

If you do choose to make #3 important, then I'd say you need to work out
how to make WALWriter active as well, so it can perform regular fsyncs,
rather than having WALReceiver wait across that I/O.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-05-27 09:39:53 Re: primary/secondary/master/slave/standby
Previous Message Simon Riggs 2010-05-27 09:12:36 Re: Synchronization levels in SR