Skip site navigation (1) Skip section navigation (2)

Re: Hot Standby query cancellation and Streaming Replication integration

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-02-26 23:56:55
Message-ID: 4B885FC7.5050209@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Bruce Momjian wrote:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay.  It seems there is no need for a
> user switch --- just max_standby_delay really high.
>   

The first issue is that you're basically saying "I don't care about high 
availability anymore" when you increase max_standby_delay to a high 
value.  Want to offload an 8 hour long batch report every day to the 
standby?  You can do it with max_standby_delay=8 hours.  But the day 
your master crashes 7 hours into that, you're in for a long wait before 
your standby is available while it replays all the queued up segments.  
Your 'hot standby' has actually turned into the old form of 'cold 
standby' just when you need it to be responsive.

This is also the reason why the whole "pause recovery" idea is a 
fruitless path to wander down.  The whole point of this feature is that 
people have a secondary server available for high-availability, *first 
and foremost*, but they'd like it to do something more interesting that 
leave it idle all the time.  The idea that you can hold off on applying 
standby updates for long enough to run seriously long reports is 
completely at odds with the idea of high-availability.

The second major problem is that the day the report actually takes 8.1 
hours instead, because somebody else ran another report that slowed you 
down a little, you're screwed if that's something you depend on being 
available--it just got canceled only *after* wasting 8 hours of 
reporting resource time.

max_standby_delay is IMHO only useful for allowing non-real-time web-app 
style uses of HS (think "Facebook status updates"), where you say "I'm 
OK giving people slightly out of date info sometimes if it lets me split 
the query load over two systems".  Set max_standby_delay to a few 
seconds or maybe a minute, enough time to service a typical user query, 
make your app tolerate the occasional failed query and only send big 
ones to the master, and you've just scaled up all the small ones.  
Distributed queries with "eventual consistency" on all nodes is where 
many of the web app designs are going, and this feature is a reasonable 
match for that use case.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com   www.2ndQuadrant.us


In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-02-27 00:03:06
Subject: Re: Avoiding bad prepared-statement plans.
Previous:From: Gokulakannan SomasundaramDate: 2010-02-26 23:36:25
Subject: Re: A thought on Index Organized Tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group