On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> One thought that occurred to me is that if the master and standby were
> more tightly coupled, you could recover after a crash by making the
> one with the further-advanced WAL position the master, and the other
> one the standby. That would get around this problem, though at the
> cost of considerable additional complexity. But then if one of the
> servers comes up and can't talk to the other, you need some mechanism
> for preventing split-brain syndrome.
Users should be free to build infrastructure to allow that. But we
can't just switch ourselves -- we don't know what other pieces of
their systems need to be updated when the master changes.
We also need to stop thinking in terms of one master and one slave.
They could have dozens of slaves and in case of failover would want to
pick the slave with the most recent WAL position. The way I picture
that happening they're monitoring all their slaves in some monitoring
tool and use that data to pick the new master. Some external tool
picks the new master and tells that host, all the other slaves, and
all the rest of the their infrastructure where to find the new master
and does whatever is necessary to restart or reload configurations.
The question I think is what interfaces do we need in Postgres to make
this easy. The monitoring tool needs a way to find the current WAL
position from the slaves even when the master is down. That means
potentially needing to start up the slaves in read-only mode with no
master at all. It also means making it easy for an external tool to
switch a node from slave to primary and change a slave's master. And
it also means a slave should be able to change master and pick up
where it left off easily. I'm not sure what the recommended interfaces
for these operations would be currently for an external tool.
In response to
pgsql-hackers by date
|Next:||From: Leonardo F||Date: 2010-07-01 13:23:49|
|Subject: bitmap indexes - performance|
|Previous:||From: Guillaume Lelarge||Date: 2010-07-01 09:30:57|
|Subject: Re: Cannot cancel the change of a tablespace|