Re: Issues with two-server Synch Rep

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Issues with two-server Synch Rep
Date: 2010-10-11 18:07:51
Message-ID: 4CB35277.8060303@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

> I'll take a crack at answering these. I don't think that the
> procedure for setting up a standby server is going to change much.
> The idea is presumably that you set up an async standby more or less
> as you do now and then make whatever configuration changes are
> necessary to flip it to synchronous.

What is the specific "flip" procedure, though? For one thing, I want to
make sure that it's not necessary to restart the master or the standby
to "flip" it, since that would be a catch-22.

> This is a completely separate issue from making replication
> synchronous. And, really? Useless for running read queries?

Absolutely. For a synch standby, you can't tolerate any standby delay
at all. This means that anywhere from 1/4 to 3/4 of queries on the
standby would be cancelled on any high-traffic OLTP server. Hence,
"useless".

>> As such, any Synch Rep patch
>> must work together with attempts to simplify administration. How does
>> your design do this?
>
> This is also completely out of scope for sync rep.

It is not, given that I've seen several proposals for synch rep which
would make asynch rep even more complicated than it already is. I'm
taking the stance that any sync rep design which *blocks* making asynch
rep easier to use is fundamentally flawed and can't be accepted.

> I don't think there's much hope of allowing administrators to take
> action BEFORE the database becomes unavailable.

I'd swear that you were working as a DBA less than a year ago, but I
couldn't tell it from that statement.

There is every bit of value in allowing DBAs to view, and chart,
response times on the standby for ACK. That way they can notice an
increase in response times and take action to improve the standby
*before* it locks up the system.

> Presumably, if
> synchronous replication is disabled via (1) or (2) above, then any
> outstanding committed-but-unacknowledged-to-the-client transactions
> should notify the client of the commit and continue on.

That's what I was asking about. I'm not "presuming" that any pending
patch covers any such eventuality until it's confirmed.

> If a client loses the connection after issuing a commit but before
> receiving the acknowledgment, it can't know whether the commit
> happened or not. This is true regardless of whether there is a
> standby and regardless of whether that standby is synchronous.
> Clients that care need to implement their own mechanisms for resolving
> this difficulty.

That's a handwavy way of saying "go away, don't bother us with such
details". For the client to resolve the situation, then *it* needs to
be able to tell whether or not the transaction was committed. How would
it do this, exactly?

> It's theoretically impossible for the transaction to become visible
> everywhere simultaneously. It's already the case that transactions
> become visible to other backends before the backend doing the commit
> has received an acknowledgment. Any client relying on any other
> behavior is already broken.

So, your opinion is "it's out of scope to handle this issue" ?

> Sync rep is going to be slow, period. Every implementation currently
> on the table has to fsync on the master, and then send the commit xlog
> record to the slave and wait for an acknowledgment from the slave.
> Allowing those to happen in parallel is going to be Hard.

Yes, but it's something we need to address. XA is widely distrusted and
is seen as inadequate for high-traffic OLTP systems precisely because it
is SO slow. If we want to create a synch rep system which people will
want to use, then it has to be faster than XA. If it's not faster than
XA, why bother creating it? We already have 2PC.

> Also, the
> interaction with max_standby_delay is going to be a big problem, I
> suspect.

Interaction? My opinion is that the two are completely incompatible.
You can't have synch rep and also have standby_delay > 0.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2010-10-11 18:18:50 Re: Which file does the SELECT?
Previous Message Tom Lane 2010-10-11 18:05:11 Re: wip: functions median and percentile