Re: monitoring warm standby lag in 8.4?

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: monitoring warm standby lag in 8.4?
Date: 2010-12-10 19:13:06
Message-ID: AANLkTimHVJrrUYsJBXBtvgdo9P75BhQEd9iYJpMb55-B@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Dec 10, 2010 at 11:27 AM, Greg Sabino Mullane <greg(at)turnstep(dot)com> wrote:
> Correct. But since we cannot connect to a database in recovery mode,
> there are very few options to determine how far 'behind' it is. The
> pg_controldata is what the check_postgres program uses. This offers a
> rough check which is usually sufficient unless you have a very
> inactive database or need very fine grained checking.
>
> A better system would perhaps connect to both ends and examine which
> specific WALs were being shipped and which one was last played, but
> there are no tools I know of that do that. I suspect the reason for
> this is that the pg_controldata check is "good enough". Certainly,
> that's what we are using for many clients via check_postgres, and
> it's been very good at detecting when the replica has problems. Good
> enough that I've never worried about writing a different method,
> anyway. :)

Thanks for the reply.

One simple piece I added in to my monitoring script which wasn't here:
http://www.kennygorman.com/wordpress/?p=249
(or in check_postgres.pl, from a quick look at check_checkpoint() in
check_postgres.pl) is a verification that the standby slave is
actually 'in archive recovery' mode, from looking at the 'Database
cluster state:' output of pg_controldata.

I was mulling over some ways to add in a reasonable check that the
standby was keeping up with the WAL stream. Comparing WAL file names
on master vs. standby would probably work, but I was also thinking
that a simple directory-size check on the standby's WAL archive
directory would show whether we were receiving WAL files faster than
we could process them.

Josh

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2010-12-10 20:05:26 Re: Invalid byte sequence
Previous Message Dmitriy Igrishin 2010-12-10 18:26:11 Re: Extended query protocol and exact types matches.