Re: pg_xlog-files not deleted on a standby after accidental server crash

From: naveen kumar <mnaveendba2(at)gmail(dot)com>
To: Paul Dunkler <paul(dot)dunkler(at)xyrality(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: pg_xlog-files not deleted on a standby after accidental server crash
Date: 2015-03-25 15:31:32
Message-ID: CADye41+aRFeq6-DW=N5U2ASddXTBn_YGedUAwprkF1USmbjiUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

archive_cleanup_command = 'pg_archivecleanup <archive_location> %r'

Refer this, it will help you.
http://www.postgresql.org/docs/9.3/static/pgarchivecleanup.html

Thanks & Regards,
M Naveen Kuamr,
PostgreSQL Database Administrator,
Mobile Number: +91 7755929449.

On Wed, Mar 25, 2015 at 3:40 PM, Paul Dunkler <paul(dot)dunkler(at)xyrality(dot)com>
wrote:

> Hi there,
>
> thanks for your answer. We already have some "replay delay" checks
> implemented in nagios, which is showing fine for this instance.
>
> Anyway, i have tested all your commands (thanks, i didn't knew some of
> them), but the result is still the same. Replication is fine and is not
> lagging... But pg_xlog content is growing and growing and growing.
>
> [root(at)ct-db1b pg_xlog]# ls -la | wc -l
> 1479
>
>
> > I might be wrong, because Im one of the growing up guy in the PostgreSQL
> world. But you can try this.
> >
> > I guess your salve is running properly, I think applying all WALs is
> taking time.
> >
> > Bellow commands are use full check replication status.
> >
> > 1. pg_stat_replication view on master/primary server.
> >
> > postgres=# select * from pg_stat_replication ;
>
> postgres=# select * from pg_stat_replication;
> pid | usesysid | usename | application_name | client_addr |
> client_hostname | client_port | backend_start | backend_xmin
> | state | sent_location | write_location | flush_location |
> replay_location | sync_priority | sync_state
>
> -------+----------+----------+------------------+-------------+-----------------+-------------+------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
> 15919 | 10 | postgres | walreceiver | 172.16.0.2 |
> | 50495 | 2015-03-24 09:46:12.59108+00 | |
> streaming | CA/CB1345F8 | CA/CB1345F8 | CA/CB1345F8 | CA/CB1345B8
> | 0 | async
> (1 row)
>
> > 2. pg_is_in_recovery() : Function which tells whether standby is still
> in recovery mode or not.
> >
> > postgres=# select pg_is_in_recovery();
>
> postgres=# select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
>
> > 3. pg_last_xlog_receive_location: Function which tells location of last
> transaction log which was streamed by Standby and also written on standby
> disk.
> >
> > postgres=# select pg_last_xlog_receive_location();
> > 4. pg_last_xlog_replay_location: Function which tells last transaction
> replayed during recovery process.
> >
> > postgres=# select pg_last_xlog_replay_location();
> >
> > 5. pg_last_xact_replay_timestamp: This function tells about the time
> stamp of last transaction which was replayed during recovery. Below is an
> example:
> >
> > postgres=# select pg_last_xact_replay_timestamp();
>
> postgres=# select pg_last_xlog_receive_location(); select
> pg_last_xlog_replay_location(); select pg_last_xact_replay_timestamp();
> pg_last_xlog_receive_location
> -------------------------------
> CA/CAF27218
> (1 row)
>
> pg_last_xlog_replay_location
> ------------------------------
> CA/CAF27218
> (1 row)
>
> pg_last_xact_replay_timestamp
> -------------------------------
> 2015-03-25 09:54:54.766285+00
> (1 row)
>
>
> > 6. Lags in Bytes: For calculating lags in bytes, users can use the
> pg_stat_replication view on the master with the function
> pg_xlog_location_diff function.
> >
> > postgres=# select
> pg_xlog_location_diff(pg_stat_replication.sent_location,
> pg_stat_replication.replay_location);
>
> postgres=# SELECT client_hostname, client_addr,
> pg_xlog_location_diff(pg_stat_replication.sent_location,
> pg_stat_replication.replay_location) AS byte_lag FROM pg_stat_replication;
> client_hostname | client_addr | byte_lag
> -----------------+-------------+----------
> | 172.16.0.2 | 13024
> (1 row)
>
> > 7.Calculating lags in Seconds. The following is SQL, which most people
> uses to find the lag in seconds:
> >
> > SELECT CASE WHEN pg_last_xlog_receive_location() =
> pg_last_xlog_replay_location()THEN 0 ELSE EXTRACT (EPOCH FROM now() -
> pg_last_xact_replay_timestamp())
> > END AS log_delay;
>
> postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() =
> pg_last_xlog_replay_location()THEN 0 ELSE EXTRACT (EPOCH FROM now() -
> pg_last_xact_replay_timestamp())
> END AS log_delay;
> log_delay
> -----------
> 0
> (1 row)
>
> > 8.If the slave is up in hot standby mode, you can tell the time in
> seconds the delay of transactions applied on the slave with this query:
> >
> > postgres=# select now() - pg_last_xact_replay_timestamp() AS
> replication_delay;
>
> postgres=# select now() - pg_last_xact_replay_timestamp() AS
> replication_delay;
> replication_delay
> -------------------
> 00:00:00.296506
> (1 row)
>
> > And please verify archive_cleanup_command in recovery.conf on slave.
>
> I don't use this command on any slaves...
>
> >
> >
> > Thanks & Regards,
> > M Naveen Kuamr,
> > PostgreSQL Database Administrator,
> > Mobile Number: +91 7755929449.
> >
> > On Tue, Mar 24, 2015 at 9:47 PM, Paul Dunkler <paul(dot)dunkler(at)xyrality(dot)com>
> wrote:
> > Hi there,
> >
> > i am currently encountering a weird problem:
> >
> > 3 Days ago we had an accidental server crash on one of our database
> servers (system just restarted).
> > After it was up again, i restarted databases (we are running 4 instances
> of postgres on a 512GB / 64 Processor server). All databases came up fine
> again.
> > Afterwards i restarted the slave postgres instances (running on another
> physical server) and they could recover correctly and went ahead with
> receiving WAL via streaming replication. All fine so far.
> >
> > Then i realized that only on 1 of the 4 slave instances, the WAL-File
> count under data/pg_xlog was raising and raising, seems like WAL files are
> not reused / deleted by postgres.
> > Afterwards i tried different things:
> > 1. Restarted the slave multiple times without any effect on the xlog
> behaviour
> > 2. Setting up the slave again from a fresh base backup of the master
> instance
> > 3. Checking file permissions, even tried manually deleting a WAL-File
> under data/pg_xlog (which worked)
> >
> > Now i'm running out of ideas why these files are not reused/deleted
> anymore.
> > I don't see any errors in the logfiles (not on slave, not on master).
> >
> > Slave is only showing normal "restartpoint" logs:
> >> Mar 24 10:51:10 ct-db1b ct-db1-1[65432]: [3-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 10:53:42 ct-db1b ct-db1-1[65432]: [4-1] db=, user=, host= LOG:
> restartpoint complete: wrote 21264 buffers (0.4%); 0 transaction log
> file(s) added, 0 removed, 16 recycled; write=149.345 s, sync=2.217 s,
> total=151.682 s; sync files=1156, longest=0.213 s, average=0.001 s
> >> Mar 24 10:53:42 ct-db1b ct-db1-1[65432]: [5-1] db=, user=, host= LOG:
> recovery restart point at C3/E9AD3258
> >> Mar 24 10:53:42 ct-db1b ct-db1-1[65432]: [5-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 09:53:42.247769+00
> >> Mar 24 10:56:10 ct-db1b ct-db1-1[65432]: [6-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 10:58:41 ct-db1b ct-db1-1[65432]: [7-1] db=, user=, host= LOG:
> restartpoint complete: wrote 6402 buffers (0.1%); 0 transaction log file(s)
> added, 0 removed, 3 recycled; write=149.516 s, sync=1.101 s, total=150.757
> s; sync files=729, longest=0.160 s, average=0.001 s
> >> Mar 24 10:58:41 ct-db1b ct-db1-1[65432]: [8-1] db=, user=, host= LOG:
> recovery restart point at C3/EDA836D8
> >> Mar 24 10:58:41 ct-db1b ct-db1-1[65432]: [8-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 09:58:40.611415+00
> >> Mar 24 11:01:10 ct-db1b ct-db1-1[65432]: [9-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 11:03:41 ct-db1b ct-db1-1[65432]: [10-1] db=, user=, host= LOG:
> restartpoint complete: wrote 11746 buffers (0.2%); 0 transaction log
> file(s) added, 0 removed, 5 recycled; write=149.652 s, sync=0.936 s,
> total=150.725 s; sync files=672, longest=0.147 s, average=0.001 s
> >> Mar 24 11:03:41 ct-db1b ct-db1-1[65432]: [11-1] db=, user=, host= LOG:
> recovery restart point at C3/F046DA18
> >> Mar 24 11:03:41 ct-db1b ct-db1-1[65432]: [11-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 10:03:40.914148+00
> >> Mar 24 11:06:10 ct-db1b ct-db1-1[65432]: [12-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 11:08:41 ct-db1b ct-db1-1[65432]: [13-1] db=, user=, host= LOG:
> restartpoint complete: wrote 11155 buffers (0.2%); 0 transaction log
> file(s) added, 0 removed, 4 recycled; write=149.780 s, sync=1.184 s,
> total=151.080 s; sync files=758, longest=0.184 s, average=0.001 s
> >> Mar 24 11:08:41 ct-db1b ct-db1-1[65432]: [14-1] db=, user=, host= LOG:
> recovery restart point at C3/F3C51B40
> >> Mar 24 11:08:41 ct-db1b ct-db1-1[65432]: [14-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 10:08:41.087379+00
> >> Mar 24 11:11:10 ct-db1b ct-db1-1[65432]: [15-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 11:13:41 ct-db1b ct-db1-1[65432]: [16-1] db=, user=, host= LOG:
> restartpoint complete: wrote 12602 buffers (0.2%); 0 transaction log
> file(s) added, 0 removed, 2 recycled; write=149.632 s, sync=1.309 s,
> total=151.067 s; sync files=728, longest=0.203 s, average=0.001 s
> >> Mar 24 11:13:41 ct-db1b ct-db1-1[65432]: [17-1] db=, user=, host= LOG:
> recovery restart point at C3/F93AC8F0
> >> Mar 24 11:13:41 ct-db1b ct-db1-1[65432]: [17-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 10:13:41.168975+00
> >> Mar 24 11:16:10 ct-db1b ct-db1-1[65432]: [18-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 11:18:41 ct-db1b ct-db1-1[65432]: [19-1] db=, user=, host= LOG:
> restartpoint complete: wrote 10234 buffers (0.2%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=149.411 s, sync=1.175 s,
> total=150.703 s; sync files=821, longest=0.142 s, average=0.001 s
> >> Mar 24 11:18:41 ct-db1b ct-db1-1[65432]: [20-1] db=, user=, host= LOG:
> recovery restart point at C3/FDD61320
> >> Mar 24 11:18:41 ct-db1b ct-db1-1[65432]: [20-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 10:18:41.052593+00
> >> Mar 24 11:21:10 ct-db1b ct-db1-1[65432]: [21-1] db=, user=, host= LOG:
> restartpoint starting: time
> >> Mar 24 11:23:40 ct-db1b ct-db1-1[65432]: [22-1] db=, user=, host= LOG:
> restartpoint complete: wrote 9105 buffers (0.2%); 0 transaction log file(s)
> added, 0 removed, 0 recycled; write=149.598 s, sync=0.842 s, total=150.579
> s; sync files=687, longest=0.061 s, average=0.001 s
> >> Mar 24 11:23:40 ct-db1b ct-db1-1[65432]: [23-1] db=, user=, host= LOG:
> recovery restart point at C4/290A078
> >> Mar 24 11:23:40 ct-db1b ct-db1-1[65432]: [23-2] db=, user=, host=
> DETAIL: last completed transaction was at log time 2015-03-24
> 10:23:40.446308+00
> >
> > Postgres-Version:
> >
> >> postgres=# select version();
> >> version
> >>
> --------------------------------------------------------------------------------------------------------------
> >> PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
> >> (1 row)
> >
> > I would really appreciate somebodies advice on this.
> >
> > Thanks in advance.
>
> --
> Mit freundlichen Grüßen / Kind regards
>
> Paul Dunkler
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Vladimir Borodin 2015-03-25 16:32:01 Hanging startup process on the replica after vacuuming on master
Previous Message Khangelani Gama 2015-03-25 11:34:35 Re: Postgres 9 : - How to interpret the EXPLAIN query results