Skip site navigation (1) Skip section navigation (2)

Re: Re: Hot Standby query cancellation and Streaming Replication integration

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-03-02 05:50:08
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Bruce Momjian wrote:
> Joachim Wieland wrote:
>> 1) With the current implementation they will see better performance on
>> the master and more aggressive vacuum (!), since they have less
>> long-running queries now on the master and autovacuum can kick in and
>> clean up with less delay than before. On the other hand their queries
>> on the standby might fail and they will start thinking that this HS+SR
>> feature is not as convincing as they thought it was...
> I assumed they would set max_standby_delay = -1 and be happy.

The admin in this situation might be happy until the first time the 
primary fails and a failover is forced, at which point there is an 
unbounded amount of recovery data to apply that was stuck waiting behind 
whatever long-running queries were active.  I don't know if you've ever 
watched what happens to a pre-8.2 cold standby when you start it up with 
hundreds or thousands of backed up WAL files to process before the 
server can start, but it's not a fast process.  I watched a production 
8.1 standby get >4000 files behind once due to an archive_command bug, 
and it's not something I'd like to ever chew my nails off to again.  If 
your goal was HA and you're trying to bring up the standby, the server 
is down the whole time that's going on.

This is why no admin who prioritizes HA would consider 
'max_standby_delay = -1' a reasonable setting, and those are the sort of 
users Joachim's example was discussing.  Only takes one rogue query that 
runs for a long time to make the standby so far behind it's useless for 
HA purposes.  And you also have to ask yourself "if recovery is halted 
while waiting for this query to run, how stale is the data on the 
standby getting?".  That's true for any large setting for this 
parameter, but using -1 for the unlimited setting also gives the maximum 
possible potential for such staleness.

'max_standby_delay = -1' is really only a reasonable idea if you are 
absolutely certain all queries are going to be short, which we can't 
dismiss as an unfounded use case so it has value.  I would expect you 
have to also combine it with a matching reasonable statement_timeout to 
enforce that expectation to make that situation safer.

In any of the "offload batch queries to the failover standby" 
situations, it's unlikely an unlimited value for this setting will be 
practical.  Perhaps you set max_standby_delay to some number of hours, 
to match your expected worst-case query run time and reduce the chance 
of cancellation.  Not putting a limit on it at all is a situation no DBA 
with healthy paranoia is going to be happy with the potential downside 
of in a HA environment, given that both unbounded staleness and recovery 
time are then both possible.  The potential of a failed long-running 
query is much less risky than either of those.

Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support

In response to


pgsql-hackers by date

Next:From: Fujii MasaoDate: 2010-03-02 06:20:36
Subject: Re: pg_stop_backup does not complete
Previous:From: Gokulakannan SomasundaramDate: 2010-03-02 05:43:21
Subject: Re: A thought on Index Organized Tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group