Quick Links

Re: Hot Standby query cancellation and Streaming Replication integration

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 18:46:29
Message-ID:	407d949e1003011046v525543eao4e677edfa8253722@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I don't think that defer_cleanup_age is a long-term solution. But we
> need *a* solution which does not involve delaying 9.0.

So I think the primary solution currently is to raise max_standby_age.

However there is a concern with max_standby_age. If you set it to,
say, 300s. Then run a 300s query on the slave which causes the slave
to fall 299s behind. Now you start a new query on the slave -- it gets
a snapshot based on the point in time that the slave is currently at.
If it hits a conflict it will only have 1s to finish before the
conflict causes the query to be cancelled.

In short in the current setup I think there is no safe value of
max_standby_age which will prevent query cancellations short of -1. If
the slave has a constant stream of queries and always has at least one
concurrent query running then it's possible that the slave will run
continuously max_standby_age-epsilon behind the master and cancel
queries left and right, regardless of how large max_standby_age is.

To resolve this I think you would have to introduce some chance for
the slave to catch up. Something like refusing to use a snapshot older
than max_standby_age/2 and instead wait until the existing queries
finish and the slave gets a chance to catch up and see a more recent
snapshot. The problem is that this would result in very unpredictable
and variable response times from the slave. A single long-lived query
could cause replay to pause for a big chunk of max_standby_age and
prevent any new query from starting.

Does anyone see any way to guarantee that the slave gets a chance to
replay and new snapshots will become visible without freezing out new
queries for extended periods of time?

--
greg

In response to

Re: Re: Hot Standby query cancellation and Streaming Replication integration at 2010-03-01 17:50:52 from Josh Berkus

Responses

Re: Re: Hot Standby query cancellation and Streaming Replication integration at 2010-03-01 19:21:48 from Josh Berkus
Re: Re: Hot Standby query cancellation and Streaming Replication integration at 2010-03-02 21:36:52 from Bruce Momjian
Re: Re: Hot Standby query cancellation and Streaming Replication integration at 2010-03-02 23:39:10 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Steve Crawford	2010-03-01 18:51:51	Re: Anyone know if Alvaro is OK?
Previous Message	Tom Lane	2010-03-01 18:39:01	Make plpgsql throw error for SELECT ... INTO rowtypevar , ... ?