Re: Synchronous Log Shipping Replication

From: Zeugswetter Andreas OSB sIT <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous Log Shipping Replication
Date: 2008-09-09 18:59:58
Message-ID: 6DAFE8F5425AB84DB3FCA4537D829A561CDFD1D070@M0164.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > Don't understand. I am referring to the logic at the top of
> > AdvanceXLInsertBuffer(). We would need to wait for all people reading
> > the contents of wal_buffers.
>
> Oh, I see.
>
> If a slave falls behind, how does it catch up? I guess you're saying
> that it can't fall behind, because the master will block before that
> happens. Also in asynchronous replication? And what about
> when the slave
> is first set up, and needs to catch up with the master?

I think the WAL Sender needs the ability to read the WAL files directly.
In cases where it falls behind, or just started, it needs to be able to catch up.
So, it seems we eighter need to copy the WAL buffer into local memory before sending,
or "lock" the WAL buffer until send finished.
Useful network timeouts are in the >= 5-10 sec range (even for GbE lan), so I don't
think locking WAL buffers is feasible. Thus the WAL sender needs to copy (the needed
portion of the current WAL buffer) before send (or use async send that immediately
returns when the buffer is copied into the network stack).

When the WAL sender is ready to continue it eighter still finds the next WAL buffer
(or the rest of the current buffer) or it needs to fall back to Plan B and
read the WAL files again. A sync client could still wait for the replicate, even if
local WAL has already advanced massively. The checkpointer would need the LSN
info from WAL senders to not reuse any still needed WAL files, although in that case
it might be time to declare the replicate broken.

Ideally the WAL sender also knows whether the client waits, so it can decide to send
a part of a buffer. The WAL sender should wake and act whenever a "network packet"
full of WAL buffer is ready, regardless of commits. Whatever size of send seems
appropriate here (might be one WAL page).
The WAL Sender should only need to expect a response, when it sent a commit record,
ideally only if a client is waiting (and once in a while at least for every log switch).

All in all a useful streamer seems like a lot of work.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-09 19:05:12 Re: Patch: plan invalidation vs stored procedures
Previous Message Lawrence, Ramon 2008-09-09 18:21:09 Potential Join Performance Issue