Re: Streaming replication status

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication status
Date: 2010-01-12 08:04:35
Message-ID: 4B4C2D13.9060302@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Greg Smith wrote:
>
>> I don't think anybody can deploy this feature without at least some very
>> basic monitoring here. I like the basic proposal you made back in
>> September for adding a pg_standbys_xlog_location to replace what you
>> have to get from ps right now:
>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>
>> That's basic, but enough that people could get by for a V1.
>>
>
> It would be more straightforward to have a function in the standby to
> return the current replay location. It feels more logical to poll the
> standby to get the status of the standby, instead of indirectly from the
> master. Besides, the master won't know how far the standby is if the
> connection to the standby is broken.
>

This is one reason I was talking in my other message about getting
simple stats on how bad the archive_command backlog is, which I'd think
is an easy way to inform the DBA "the standby isn't keeping up and disk
is filling" in a way that's more database-centric than just looking at
disk space getting gobbled.

I think that it's important to be able to get whatever useful
information you can from both the primary and the standby, because most
of the interesting (read: painful) situations here are when one or the
other is down. The fundamental questions here are:

-When things are running normally, how much is the standby lagging by?
This is needed for a baseline of good performance, by which you can
detect problems before they get too bad.
-If the standby is down altogether, how can I get more information about
the state of things from the primary?
-If the primary is down, how can I tell more from the standby?

Predicting what people are going to want to do when one of these bad
conditions pops up is a large step ahead of where I think this
discussion should be focusing on now. You have to show how you're going
to measure the badness here in the likely failure situations before you
can then take action on them. If you do the former well enough, admins
will figure out how to deal with the latter in a way compatible with
their business processes in the first version.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-01-12 08:08:25 Re: planner or statistical bug on 8.5
Previous Message Matteo Beccati 2010-01-12 08:03:18 Re: planner or statistical bug on 8.5