Re: Cascading replication: should we detect/prevent cycles?

From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Subject: Re: Cascading replication: should we detect/prevent cycles?
Date: 2012-12-20 22:28:43
Message-ID: 1090902042.155329.1356042523541.JavaMail.root@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

> > What would such a test look like? It's not obvious to me that
> > there's any rapid way for a user to detect this situation, without
> > checking each server individually.
>
> Change something on the master and observe that none of the supposed
> standbys notice?

That doesn't sound like an infallible test, or a 60-second one.

My point is that in a complex situation (imagine a shop with 9 replicated servers in 3 different cascaded groups, immediately after a failover of the original master), it would be easy for a sysadmin, responding to middle of the night page, to accidentally fat-finger an IP address and create a cycle instead of a new master. And once he's done that, a longish troubleshooting process to figure out what's wrong and why writes aren't working, especially if he goes to bed and some other sysadmin picks up the "Writes failing to PostgreSQL" ticket.

*if* it's relatively easy for us to detect cycles (that's a big if, I'm not sure how we'd do it), then it would help a lot for us to at least emit a WARNING. That would short-cut a lot of troubleshooting.

--Josh Berkus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Berkus 2012-12-20 22:30:32 Re: Feature Request: pg_replication_master()
Previous Message Merlin Moncure 2012-12-20 22:23:47 Re: pg_top