Re: Re: Hot Standby query cancellation and Streaming Replication integration

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-03-01 03:12:22
Message-ID: 603c8f071002281912x3b34c2d5va2459fe18dd4c49a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Greg, Joachim,
>
>> As I see it, the main technical obstacle here is that a subset of a
>> feature already on the SR roadmap needs to get built earlier than
>> expected to pull this off.  I don't know about Tom, but I have no
>> expectation it's possible for me to get up to speed on that code fast
>> enough to contribute anything there.  I expect the thing I'd be most
>> productive at as far as moving the release forward is to continue
>> testing this pair of features looking for rough edges, which is what I
>> have planned for the next month.
>
> That's OK with me.  I thought you were saying that xmin-pub was going to
> be easier than expected.  Per my other e-mails, I think that we should
> be shooting for "good enough, on time" for 9.0., rather than "perfect".
>  We can't ever get to "perfect" if we don't release software.

I agree. It seems to me that the right long term fix for the problem
of query cancellations on the slave is going to be to give the slave
the ability to save multiple versions of relation pages where
necessary so that older snapshots can continue to be used even after
the conflicting WAL has been applied. However, I'm pretty sure that's
going to be a very difficult project which is unlikely to be coded by
anyone any time soon, let alone merged. Until it does, we're going to
force people to pick from a fairly unappealing menu of options:
postpone WAL replay for long periods of time, cancel queries (perhaps
even seemingly unrelated to what changed on the master), bloat the
master. All of those options are seriously unpleasant.

I think, though, that we have to think of this as being like the
Windows port, or maybe even more significant than that, as an
architectural change. I think it is going to take several releases
for this feature to be well-understood and stable and have all the
options we'd like it to have. It wouldn't surprise me if we get to
10.0 before we really have truly seamless replication. I don't expect
Slony or Londiste or any of the other solutions that are out there now
to get kicked to the curb by PG 9.0. Still, a journey of a thousand
miles begins with the first step. Simon and many others have put a
great deal of time and energy into getting us to the point where we
are now, and if we let the fact that we haven't reached our ultimate
goal keep us from putting what we have out there in front of our
customers, I think we're going to regret that.

I think the thing to do is to reposition our PR around these features.
We should maybe even go so far as to call them "beta" or
"experimental". We shouldn't tell people - this is going to be
totally awesome. We should tell people - this is a big improvement,
and it's still got some pretty significant limitations, but it's good
stuff and it's going in a good direction. Overhyping what we have
today is not going to be good for the project, and I'm frankly quite
afraid that nothing we can possibly code between now and the release
is going to measure up to what people are hoping for. We need to set
our own expectations, and those of our customers, at a level at which
they can be met.

> Quite frankly, simply telling people that "long-running queries on the
> slave tend not to be effective, wait for 9.1" is a possibility.

Yep.

> HS+SR is still a tremendous improvement over the options available
> previously.  We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

Yep.

> However, I'd still like to hear from someone with the requisite
> technical knowledge whether capturing and retrying the current query in
> a query cancel is even possible.

I'm not sure who you want to hear from here, but I think that's a dead end.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-03-01 03:21:20 Re: Could we do pgindent on just utils/adt/xml.c in the 8.3 branch?
Previous Message Greg Smith 2010-03-01 03:00:09 Re: Re: Hot Standby query cancellation and Streaming Replication integration