Re: 9.0beta2 - server crash when using HS + SR

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.0beta2 - server crash when using HS + SR
Date: 2010-06-14 00:16:48
Message-ID: 4C1574F0.2070304@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Rafael Martinez wrote:
> A minimum and probably the only feasible thing for 9.0 will be to update
> the documentation. We need an entry in the hot-standby caveats section
> explaining that if you create a tablespace and the directory needed does
> not exist in the the standby, the standby will shutdown itself and will
> not be able to start until the directory is also created in the standby.
>

This is not a Hot Standby problem, and it's been documented since at
least http://www.postgresql.org/docs/8.2/static/warm-standby.html ; read
25.2.1 "Planning" in the current
http://developer.postgresql.org/pgdocs/postgres/warm-standby.html where
it's spelled out quite clearly.

It's a mixed blessing that it's now possible to actually get a
replicated server up so much more easily that people don't have to read
that particular document quite as carefully now and still get something
going. But if there's a documentation change to made, it should be
highlighting the warning already in that section better; it's not
something appropriate for the Hot Standby caveats. Since this is
clearly documented already, and there are bigger problems to worry about
for the current release, the real minimum action to perform here (and
the only one I would consider reasonable) is to change nothing at this
point for 9.0 here. I'm sorry you missed where this was covered, but
adding redundant documentation for basics like this invariably leads to
the multiple copies becoming out of sync with one another as changes are
made in the future.

> 1) PostgreSQL creates the directory needed for the tablespace if the
> user running postgres has privileges to do so at the OS level.
> 2) The standby discovers that the directory needed does not exist and
> pauses the recovering (without shutting down the server) in the WAL
> record that creates the tablespace. The standby will check periodically
> if the directory is created before starting the recovery process again.
>

Given that the idea behind a tablespace is that you want to relocate it
to a specific storage path, which may not map in the same way on the
standby, your first idea will never get implemented; it's not something
you want the server to guess about. As for the second, I would rather
see the standby go down--and hopefully set off some serious alarms for
the DBA who has screwed up here--than to stay up in a dysfunctional
polling state. The very serious mistake made is far more likely to be
discovered the way it's built right now.

I wouldn't be adverse to improving the error messages emitted when this
happens by the server to make it more obvious what's gone wrong in 9.1.
That's the only genuine improvement I'd see value in here, to cut down
on other people running into what you did and being as confused by it.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-06-14 00:22:36 Re: Patch to show individual statement latencies in pgbench output
Previous Message Bruce Momjian 2010-06-13 23:57:39 Command Prompt 8.4.4 PRMs compiled with debug/assert enabled