Re: Issues with two-server Synch Rep

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issues with two-server Synch Rep
Date: 2010-10-12 00:44:30
Message-ID: AANLkTi=Z8ozyMbWHHrpFe1ewvv5hafX2rjMh+6HS6at6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 11, 2010 at 7:16 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2010-10-11 at 16:07 -0400, Robert Haas wrote:
>> > I was initially taken aback by the word "useless" as well. However, I
>> > had trouble thinking of a use case that isn't better solved by sync rep
>> > without HS, or async rep. I don't have the numbers either though, so
>> > perhaps someone does have a use case.
>>
>> The main use cases for synchronous replication seem to be (1) high
>> availability and (2) read scalability.  That is, if you have 99%
>> writes and 1% reads, you can round-robin the reads and do all the
>> writes on the master.  But I think we are quite a way from making (2)
>> work well enough to get excited about.
>
> [ I assume you meant "99% reads and 1% writes" ]

Oops, yes.

> Wouldn't the snapshot publication (as Josh called it) back to the master
> work better for that use case?

Well, that would help make it more useful. Of course then bloat on
any machine will bloat the entire cluster...

> I'm not even sure that it's the ratio that matters, but rather how
> constant the writes are. 1% writes does not necessarily mean that a
> random 1% of read queries fail on the standby. I don't have the numbers,
> but SR + query cancel seems like the standby system would effectively be
> down during write activity. I wouldn't be surprised if SR + query cancel
> resulted in some frustrated users; but perhaps "useless" is too strong a
> word.

Yeah.

>> >> It would be far better if we could decouple master cleanup from
>> >> standby cleanup, so that only the machine that actually has the old
>> >> query gets bloated.  However, no one seems excited about writing that
>> >> code.
>> >
>> > That doesn't seem just a matter of code, it seems like a major design
>> > conflict.
>>
>> Yes.  I had the idea of trying to fix this by allowing the standby to
>> retain old versions of entire pages that got cleaned up on the master,
>> until the transactions that might want to read the old pages were
>> gone.  But that may be prohibitively difficult, not sure.
>
> I think you'd end up having a notion of a snapshot of block information
> (like a FS with snapshots) inside of postgres.

Yep.

> Sounds like a lot of complexity to me, and the only benefit I see is
> moving bloat from the primary to the standby. Granted, that would be
> nice, but I would expect some costs aside from just the complexity.

The standby is bloated either way, but you avoid propagating that
bloat back to the master. It's particularly pernicious if you have a
master and 17 standbys. Now any single standby with a long running
query bloats all 18 machines. Not awesome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2010-10-12 00:50:25 Re: Review: Fix snapshot taking inconsistencies
Previous Message Robert Haas 2010-10-12 00:41:50 Re: Issues with two-server Synch Rep