Re: Streaming replication status

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication status
Date: 2010-01-13 08:47:33
Message-ID: 4B4D88A5.2050905@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner wrote:
> so is there an actually concrete proposal of _what_ interals to expose? '

The pieces are coming together...summary:

-Status quo: really bad, but could probably ship anyway because
existing PITR is no better and people manage to use it
-Add slave pg_current_xlog_location() and something like
pg_standby_received_xlog_location(): Much better, gets rid of the worst
issues here.
-Also add pg_standbys_xlog_location() on the master: while they could
live without it, this really helps out the "alert/monitor" script writer
whose use cases keep popping up here.

Details...the original idea from Fujii was:

"I'm thinking something like pg_standbys_xlog_location() [on the
primary] which returns
one row per standby servers, showing pid of walsender, host name/
port number/user OID of the standby, the location where the standby
has written/flushed WAL. DBA can measure the gap from the
combination of pg_current_xlog_location() and pg_standbys_xlog_location()
via one query on the primary."

After some naming quibbles and questions about what direction that
should happen in, Tom suggested the initial step here is:

"It seems to me that we should have at least two functions available
on the slave: latest xlog location received and synced to disk by
walreceiver (ie, we are guaranteed to be able to replay up to here);
and latest xlog location actually replayed (ie, the state visible
to queries on the slave). The latter perhaps could be
pg_current_xlog_location()."

So there's the first two of them: on the slave,
pg_current_xlog_location() giving the latest location replayed, and a
new one named something like pg_standby_received_xlog_location(). If
you take the position that an unreachable standby does provide answers
to these questions too (you just won't like them), this pair might be
sufficient to ship.

To help a lot at dealing with all the error situations where the standby
isn't reachable and segments are piling up (possibly leading to full
disk), the next figure that seems to answer the most questions is asking
the primary "what's the location of the last WAL segment file in the
pile of ones to be archived/distributed that has been requested (or
processed if that's the easier thing to note) by the standby?". That's
what is named pg_standbys_xlog_location() in the first paragraph I
quoted. If you know enough to identify that segment file on disk, you
can always look at its timestamp (and the ones on the rest of the files
in that directory) in a monitoring script to turn that information into
segments or a time measurement instead--xlog segments are nicely ordered
after all.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-01-13 08:47:48 Re: Streaming replication and non-blocking I/O
Previous Message Boszormenyi Zoltan 2010-01-13 08:22:28 Re: ECPG patch causes warning