Re: Synchronous replication, reading WAL for sending

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Pavan Deolasee <pavan(dot)deolasee(at)enterprisedb(dot)com>
Subject: Re: Synchronous replication, reading WAL for sending
Date: 2008-12-23 16:48:42
Message-ID: 1230050922.4793.893.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Tue, 2008-12-23 at 17:42 +0200, Heikki Linnakangas wrote:

> As the patch stands, whenever XLOG segment is switched in XLogInsert, we
> wait for the segment to be sent to the standby server. That's not good.
> Particularly in asynchronous mode, you'd expect the standby to not have
> any significant ill effect on the master. But in case of a flaky network
> connection, or a busy or dead standby, it can take a long time for the
> standby to respond, or the primary to give up. During that time, all WAL
> insertions on the primary are blocked. (How long is the default TCP
> timeout again?)

Ugh, didn't see that. Get rid of that. We managed to get rid of the
fsync of the control file when we changed WAL file at start of 8.3. That
had a major effect on performance, via reduced response time profiles.
No need to re-introduce a delay in the same place.

> Another point is that in the future, we really shouldn't require setting
> up archiving and file-based log shipping using external scripts, when
> all you want is replication. It should be enough to restore a base
> backup on the standby, and point it to the IP address of the primary,
> and have it catch up. This is very important, IMHO. It's quite a lot of
> work to set up archiving and log-file shipping, for no obvious reason.
> It's really only needed at the moment because we're building this
> feature from spare parts.

Happy for that to be hidden more from users.

> For those reasons, we need a way to send arbitrary ranges of WAL from
> primary to standby. The current method where the WAL is read from
> wal_buffers obviously only works for very recent WAL pages that are
> still in wal_buffers. The design should be changed so that instead of
> reading from wal_buffers, the WAL is read from filesystem.

There are two basic ways: from memory and from files. Sure we can hide
the two mechanisms in code better, but they will remain fairly distinct.

> Sending directly from wal_buffers can be provided as a fastpath when
> sending recent enough WAL range, but I wouldn't bother complicating the
> code for now.

Sounds like you are saying completely replace the write-from-buffers and
replace it with write-from-file?

Sending from wal_buffers is OK if wal_buffers is large enough. If
streaming replication falls so far behind that we have problems then
there are larger issues to worry about, like is the primary being driven
too hard for the network to cope.

Copying direct from memory means that a disk problem that occurs on the
primary will never cause corruption on the standby. Reading WAL files
can mean that corruptions get propagated.

The current design allows for file based WAL sending, if the connection
is so poor that streaming won't work.

If you are seriously suggesting these things now then I'd like to see
some diagrams, designs and descriptions so we can all understand what is
being suggested, how it will cope with all the current requirements.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-12-23 16:52:51 Re: Sync Rep: Second thoughts
Previous Message Heikki Linnakangas 2008-12-23 16:44:29 Re: [Fwd: Re: Transactions and temp tables]