Skip site navigation (1) Skip section navigation (2)

Re: Re: Hot Standby query cancellation and Streaming Replication integration

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-03-02 01:34:53
Message-ID: 4B8C6B3D.1060107@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Josh Berkus wrote:
> However, this leaves aside Greg's point about snapshot age and
> successive queries; does anyone dispute his analysis?  Simon?
>   

There's already a note on the Hot Standby TODO about unexpectly bad 
max_standby_delay behavior being possible on an idle system, with no 
suggested resolution for it besides better SR integration.  The issue 
Greg Stark has noted is another variation on that theme.  It's already 
on my list of theorized pathological but as yet undemonstrated concerns 
that Simon and I identified, the one I'm working through creating a test 
cases to prove/disprove.  I'm past "it's possible..." talks at this 
point though as not to spook anyone unnecessarily, and am only raising 
things I can show concrete examples of in action.  White box testing at 
some point does require pausing one's investigation of what's in the box 
and getting on with the actual testing instead.

The only real spot where my opinion diverges here that I have yet to 
find any situation where 'max_standby_delay=-1' makes any sense to me.  
When I try running my test cases with that setting, the whole system 
just reacts far too strangely.  My first patch here is probably going to 
be adding more visibility into the situation when queries are blocking 
replication forever, because I think the times I find myself at "why is 
the system hung right now?" are when that happens and it's not obvious 
as an admin what's going on.

Also, the idea that a long running query on the standby could cause an 
unbounded delay in replication is so foreign to my sensibilities that I 
don't ever include it in the list of useful solutions to the problems 
I'm worried about.  The option is there, not disputing that it makes 
sense for some people because there seems some demand for it, just can't 
see how it fits into any of the use-cases I'm concerned about.

I haven't said anything about query retry mainly because I can't imagine 
any way it's possible to build it in time for this release, so whether 
it's eventually feasible or not doesn't enter into what I'm worried 
about right now.  In any case, I would prioritize that behind work on 
preventing the most common situations that cause cancellations in the 
first place, until those are handled so well that retry is the most 
effective improvement left to consider.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com   www.2ndQuadrant.us


In response to

pgsql-hackers by date

Next:From: Ed L.Date: 2010-03-02 01:48:15
Subject: Re: [SOLVED] Re: Hung postmaster (8.3.9)
Previous:From: Ed L.Date: 2010-03-02 01:31:52
Subject: Re: Hung postmaster (8.3.9)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group