Re: Sync Rep for 2011CF1

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep for 2011CF1
Date: 2011-02-08 19:34:15
Message-ID: AANLkTin7cxaaOohNQHq_S8ZmcCQLEc0RgE=X1_ZRx7JJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 8, 2011 at 19:53, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Feb 7, 2011 at 1:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sat, Jan 15, 2011 at 4:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Here's the latest patch for sync rep.
>>
>> Here is a rebased version of this patch which applies to head of the
>> master branch.  I haven't tested it yet beyond making sure that it
>> compiles and passes the regression tests -- but this fixes the bitrot.
>
> As I mentioned yesterday that I would do, I spent some time working on
> this.  I think that there are somewhere between three and six
> independently useful features in this patch, plus a few random changes
> to the documentation that I'm not sure whether want or not (e.g.
> replacing master by primary in a few places, or the other way around).
>
> One problem with the core synchronous replication technology is that
> walreceiver cannot both receive WAL and write WAL at the same time.
> It switches back and forth between reading WAL from the network socket
> and flushing it to disk.  The impact of that is somewhat mitigated in
> the current patch because it only implements the "fsync" level of
> replication, and chances are that the network read time is small
> compared to the fsync time.  But it would certainly suck for the
> "receive" level we've talked about having in the past, because after
> receiving each batch of WAL, the WAL receiver wouldn't be able to send
> any more acknowledgments until the fsync completed, and that's bound
> to be slow.  I'm not really sure how bad it will be in "fsync" mode;
> it may be tolerable, but as Simon noted in a comment, in the long run
> it'd certainly be nicer to have the WAL writer process running during
> recovery.
>
> As a general comment on the quality of the code, I think that the
> overall logic is probably sound, but there are an awful lot of
> debugging leftovers and inconsistencies between various parts of the
> patch.  For example, when I initially tested it, *asynchronous*
> replication kept breaking between the master and the standby, and I
> couldn't figure out why.  I finally realized that there was a ten
> second pause that had been inserting into the WAL receiver loop as a
> debugging tool which was allowing the standby to get far enough behind
> that the master was able to recycle WAL segments the standby still
> needed.  Under ordinary circumstances, I would say that a patch like
> this was not mature enough to submit for review, let alone commit.
> For that reason, I am pretty doubtful about the chances of getting
> this finished for 9.1 without some substantial prolongation of the
> schedule.
>
> That having been said, there is at least one part of this patch which
> looks to be in pretty good shape and seems independently useful
> regardless of what happens to the rest of it, and that is the code
> that sends replies from the standby back to the primary.  This allows
> pg_stat_replication to display the write/flush/apply log positions on
> the standby next to the sent position on the primary, which as far as
> I am concerned is pure gold.  Simon had this set up to happen only
> when synchronous replication or XID feedback in use, but I think
> people are going to want it even with plain old asynchronous
> replication, because it provides a FAR easier way to monitor standby
> lag than anything we have today.  I've extracted this portion of the
> patch, cleaned it up a bit, written docs, and attached it here.

+1. I haven't actually looked at the patch, but having this ability
would be *great*.

I also agree with the general idea of trying to break it into smaller
parts - even if they only provide small parts each on it's own. That
also makes it easier to get an overview of exactly how much is left,
to see where to focus.

> The only real complaint I can imagine about offering this
> functionality all the time is that it uses extra bandwidth.  I'm
> inclined to think that the ability to shut it off completely is
> sufficient answer to that complaint.

Yes, agreed.

I would usually not worry about the bandwidth, really, I'd be more
worried about potentially increasing latency somewhere.

> <dons asbestos underwear>

The ones with little rocketships on them?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2011-02-08 19:43:18 Re: REVIEW Range Types
Previous Message Peter Eisentraut 2011-02-08 19:26:06 Re: [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)