Summary and Plan for Hot Standby

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Summary and Plan for Hot Standby
Date: 2009-11-15 08:06:09
Message-ID: 1258272369.14054.993.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


After some time thinking about the best way forward for Hot Standby, I
have some observations and proposals.

First, the project is very large. We have agreed ways to trim the patch,
yet it remains large. Trying to do everything in one lump is almost
always a bad plan, so we need to phase things.

Second, everybody is keen that HS hits the tree, so we can have alpha
code etc.. There are a few remaining issues that should *not* be rushed.
The only way to remove this dependency is to decouple parts of the
project.

Third, testing the patch is difficult and continuous change makes it
harder to guarantee everything is working.

There are two remaining areas of significant thought/effort:

* Issues relating to handling of prepared transactions
* How fast Hot Standby mode is enabled in the standby

I propose that we stabilise and eventually commit a version of HS that
circumvents/defers those issues and then address the issues with
separate patches afterwards. This approach will allow us to isolate the
areas of further change so we can have a test blitz to remove silly
mistakes, then follow it with a commit to CVS, and then release as Alpha
to allow further testing.

Let's look at the two areas of difficulty in more detail

* Issues relating to handling of prepared transactions
There are some delicate issues surrounding what happens at the end of
recovery if there is a prepared transaction still holding an access
exclusive lock. It is straightforward to say, as an interim measure,
"Hot Standby will not work with max_prepared_transactions > 0". I see
that this has a fiddly, yet fairly clear solution.

* How fast Hot Standby mode is enabled in the standby
We need to have full snapshot information on the standby before we can
allow connections and queries. There are two basic approaches: i) we
wait until we *know* we have full info or ii) we try to collect data and
inject a correct starting condition. Waiting (i) may take a while, but
is clean and requires only a few lines of code. Injecting the starting
condition (ii) requires boatloads of hectic code and we have been unable
to agree a way forwards. If we did have that code, all it would give us
is a faster/more reliable starting point for connections on the standby.
Until we can make approach (ii) work, we should just rely on the easy
approach (i). In many cases, the starting point is very similar. (In
some cases we can actually make (i) faster because the overhead of data
collection forces us to derive the starting conditions minutes apart.)

Phasing the commit seems like the only way.

Please can we agree a way forwards?

--
Simon Riggs www.2ndQuadrant.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2009-11-15 09:00:33 Re: Summary and Plan for Hot Standby
Previous Message Hitoshi Harada 2009-11-15 08:05:16 Re: NULL input for array_agg()?