Re: Hot Standby query cancellation and Streaming Replication integration

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-02-26 23:56:55
Message-ID: 4B885FC7.5050209@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay. It seems there is no need for a
> user switch --- just max_standby_delay really high.
>

The first issue is that you're basically saying "I don't care about high
availability anymore" when you increase max_standby_delay to a high
value. Want to offload an 8 hour long batch report every day to the
standby? You can do it with max_standby_delay=8 hours. But the day
your master crashes 7 hours into that, you're in for a long wait before
your standby is available while it replays all the queued up segments.
Your 'hot standby' has actually turned into the old form of 'cold
standby' just when you need it to be responsive.

This is also the reason why the whole "pause recovery" idea is a
fruitless path to wander down. The whole point of this feature is that
people have a secondary server available for high-availability, *first
and foremost*, but they'd like it to do something more interesting that
leave it idle all the time. The idea that you can hold off on applying
standby updates for long enough to run seriously long reports is
completely at odds with the idea of high-availability.

The second major problem is that the day the report actually takes 8.1
hours instead, because somebody else ran another report that slowed you
down a little, you're screwed if that's something you depend on being
available--it just got canceled only *after* wasting 8 hours of
reporting resource time.

max_standby_delay is IMHO only useful for allowing non-real-time web-app
style uses of HS (think "Facebook status updates"), where you say "I'm
OK giving people slightly out of date info sometimes if it lets me split
the query load over two systems". Set max_standby_delay to a few
seconds or maybe a minute, enough time to service a typical user query,
make your app tolerate the occasional failed query and only send big
ones to the master, and you've just scaled up all the small ones.
Distributed queries with "eventual consistency" on all nodes is where
many of the web app designs are going, and this feature is a reasonable
match for that use case.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-02-27 00:03:06 Re: Avoiding bad prepared-statement plans.
Previous Message Gokulakannan Somasundaram 2010-02-26 23:36:25 Re: A thought on Index Organized Tables