Re: Re: Hot Standby query cancellation and Streaming Replication integration

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date: 2010-02-28 06:07:20
Message-ID: 4B8A0818.5000100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> It seems to me that if we're forced to pass the xmin from the
> slave back to the master, that would be a huge step backward in terms
> of both scalability and performance, so I really hope it doesn't come
> to that.

Not forced to--have the option of. There are obviously workloads where
you wouldn't want this. At the same time, I think there are some pretty
common ones people are going to expect HS+SR to work on transparently
where this would obviously be the preferred trade-off to make, were it
available as one of the options. The test case I put together shows an
intentionally pathological but not completely unrealistic example of
such a workload.

> I wish I understood better exactly what you mean by "the
> notion of synchronizing the WAL stream against slave queries" and why
> you don't think it will work. Can you elaborate?
>

There's this constant WAL stream coming in from the master to the
slave. Each time the slave is about to apply a change from that stream,
it considers "will this disrupt one of the queries I'm already
executing?". If so, it has to make a decision about what to do; that's
where the synchronization problem comes from.

The current two options are "delay applying the change", at which point
the master and standby will drift out of sync until the query ends and
it can catch back up, or "cancel the query". There are tunables for
each of these, and they all seem to work fine (albeit without too much
testing in the field yet). My concern is that the tunable that tries to
implement the other thing you might want to optimize for--"avoid letting
the master generate WAL entires that are the most likely ones to
conflict"--just isn't very usable in its current form.

Tom and I don't see completely eye to eye on this, in that I'm not so
sure the current behaviors are "fundamentally wrong and we will never be
able to make [them] work". If that's really the case, you may not ever
get the scalability/performance results you're hoping for from this
release, and really we're all screwed if those are the only approaches
available.

What I am sure of is that a SR-based xmin passing approach is simpler,
easier to explain, more robust for some common workloads, and less
likely to give surprised "wow, I didn't think *that* would cancel my
standby query" reports from the field than any way you can configure Hot
Standby alone right now. And since I never like to bet against Tom's
gut feel, having it around as a "plan B" in case he's right about an
overwhelming round of bug reports piling up against the
max_standby_delay etc. logic doesn't hurt either.

I spent a little time today seeing if there was any interesting code I
might steal from the early "synchrep" branch at
http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary ,
but sadly when I tried to rebase that against the master to separate out
just the parts unique to it the merge conflicts were overwhelming. I
hate getting beaten by merge bitrot even when Git is helping.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-02-28 06:18:00 Re: Hot Standby query cancellation and Streaming Replication integration
Previous Message Mark Kirkwood 2010-02-28 06:06:36 Re: Lock Wait Statistics (next commitfest)