Re: Immediate standby promotion

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "fabriziomello(at)gmail(dot)com" <fabriziomello(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-08-14 20:07:38
Message-ID: 1408046858.61496.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> We already have the facilities to stop replay at a defined
>> place.  But then what?  Without this patch, do well tell the
>> customer to stop replay, do a pg_dump of the whole database, and
>> restore it into a new database?  Because that's crazy.
>
> Yeah... and as Fujji already said another case is when some
> operation error occurs in the master (like a wrong "drop
> database") and we have a time-delayed standby that can be used to
> recover the mistake quickly.

I have been in the position of having an ad hoc data fix by someone
running raw SQL where they forgot the WHERE clause on a DELETE from
the table that just about everything joins to (the "Case" table
for a court system).  Since we had both PITR backups and logical
replication we were able to recover by kicking the users out, doing
a PITR recovery up to shortly before the mistake was made, and then
replaying the logical transaction stream from that point to the
end, excluding the bad transaction.

On the face of it, that doesn't sound like a big deal, right?  But
we had to kick out seven state Supreme Court justices, 16 Court of
Appeals judges, and the related support staff for a couple hours.
Trust me, with a delayed replica and the option of an immediate
promotion of the standby, I would have had a less stressful day.
Instead of telling all those people they couldn't use a key tool in
their workflow for two hours, I could have told them that there
would be a one or two minute outage after which any entries in the
last "n" minutes would be delayed in appearing in their view of the
data for two hours.  The justices would have been a lot happier,
and when they are happier, so is everyone else.  :-)

The suggested feature seems useful to me.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-08-14 20:12:08 Re: jsonb format is pessimal for toast compression
Previous Message Steve Singer 2014-08-14 20:03:08 9.4 logical decoding assertion