Re: Synchronous Standalone Master Redoux

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Standalone Master Redoux
Date: 2012-07-17 06:45:43
Message-ID: CAAZKuFZLReW_XdMs=TgvgcqW=xtjAL7WwB+NfkMdDEki-oBZ-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 16, 2012 at 10:58 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> BTW, one little detail that I don't think has been mentioned in this thread
> before: Even though the master currently knows whether a standby is
> connected or not, and you could write a patch to act based on that, there
> are other failure scenarios where you would still not be happy. For example,
> imagine that the standby has a disk failure. It stays connected to the
> master, but fails to fsync anything to disk. Would you want to fall back to
> degraded mode and just do asynchronous replication in that case? How do you
> decide when to do that in the master? Or what if the standby keeps making
> progress, but becomes incredibly slow for some reason, like disk failure in
> a RAID array? I'd rather outsource all that logic to external monitoring
> software - software that you should be running anyway.

I would like to express some support for the non-edge nature of this
case. Outside of simple loss of availability of a server, losing
access to a block device is probably the second-most-common cause of
loss of availability for me. It's especially insidious because simple
"select 1" checks may continue to return for quite some time, so
instead we rely on linux diskstats parsing to see if write progress
hits zero for "a while."

In cases like these, the overhead of a shell-command to rapidly
consort with a decision-making process can be prohibitive -- it's
already a pretty big waster of time for me in wal
archiving/dearchiving, where process startup and SSL negotiation and
lack of parallelization can be pretty slow. This may also exhibit
this problem.

I would like to plead that whatever is done would be most useful being
controllable via non-GUCs in its entirely -- arguably that is already
the case, since one can write a replication protocol client to do the
job, by faking the standby status update messages, but perhaps there
is a more lucid way if one makes accommodation. In particular, the
awkwardness of using pg_receivexlog[0] or a similar tool for replacing
archive_command is something that I feel should be addressed
eventually, as to not be a second-class citizen. Although that is
already being worked on[1]...the archive command has no backpressure
either, other than "out of disk".

The case of restore_command is even more sore: remastering or
archive-recovery via streaming protocol actions is kind of a pain at
the moment. I haven't thoroughly explored this yet and I don't think
it is documented, but it can be hard for something that is dearchiving
from wal segments stored somewhere to find exactly the right record to
start replaying at: the wal record format is not stable, and it need
not be, if the server helps by ignoring records that predate what it
requires or can inform the process feeding WAL that it got things
wrong. Maybe that is the case, but it is not documented. I also
don't think any guarantees around the maximum size or alignment of WAL
shipped by the streaming protocol in XLogData messages, and that's too
bad. Also, the endianness of WAL position fields in the XLogData is
host-byte-order-dependent, which sucks if you are forwarding WAL
around but need to know what range is contained in a message. In
practice many people can say "all I have is little-endian," but it is
somewhat unpleasant and not necessarily the case.

Correct me if I'm wrong, I'd be glad for it.

[0]: see the notes section,
http://www.postgresql.org/docs/devel/static/app-pgreceivexlog.html
[1]: http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-07-17 08:02:09 Re: Using pg_upgrade on log-shipping standby servers
Previous Message Heikki Linnakangas 2012-07-17 05:58:57 Re: Synchronous Standalone Master Redoux