Why are some WAL files in pg_xlog symlinks to old files?

From: Nigel <nigelspleen(at)gmail(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Why are some WAL files in pg_xlog symlinks to old files?
Date: 2010-09-29 01:15:15
Message-ID: AANLkTi=-WfvBJ-87pdJCJrUNEvPFoHeXPNUF6hF0CAKY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,

We're running PG 8.3 in a warm standby configuration. About 3 weeks ago we
had to fail over from the primary to the standby. That worked fine, but
we're having problems getting standby mode set up again. On the new
standby, everything works fine for a little while: WALs were rsynced over
and processed correctly as far as I can tell. But every 65-75 minutes (very
regularly), a WAL file is copied that's actually a symlink. When the
standby tries to read the rsynced symlink, it hangs indefinitely, presumably
because the target of the link doesn't exist on the standby.

In the primary's pg_xlog, I see the expected WAL files with increasing
numbers and recent modification dates, but every 65-75 files there's one of
these symlinks. For example:

Sep 28 16:13 0000000300000A5C00000070
Sep 28 16:15 0000000300000A5C00000071
Sep 28 16:12 0000000300000A5C00000072
Sep 5 01:00 0000000300000A5C00000073 ->
/srv/db/chdbprod_wal_archives/00000001000009D6000000D6
Sep 28 16:21 0000000300000A5C00000074
Sep 28 16:19 0000000300000A5C00000075

The "/srv/db/chdbprod_wal_archives" directory is where incoming WAL files
used to go, back when the current primary server was the standby. The
September 5 date you see above is shortly before the failover was done. It
confused me at first until I remembered that it's the mod date of the target
of the symlink, not the link itself (which in this case was presumably
created around 16:20). The target of the symlinks is always the same.

pg_xlog also contains a 00000003.history file, which references the target
of the symlinks. Here's its contents:

1 00000001000009D6000000D6 before transaction 0 at 2000-01-01
00:00:00+00

I gather that my problems here are due to having a primary server that was
itself formerly a standby, but I'm not sure what action to take. I don't
know enough about how the history files work and what the significance of
the symlinks is. What purpose to the symlinks serve? Why are they
recreated regularly at slighly more than hourly intervals? Why do they
point to a directory that was only used back when the primary was a
standby? (If it makes any difference, back when the primary server was a
standby, it was running pg_standby with the -l option.) Does their presence
mean that something's wrong on the primary, or should they be ignored when
copying to the standby?

Thanks in advance for any information!
Chris

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Fujii Masao 2010-09-29 02:50:42 Re: Why are some WAL files in pg_xlog symlinks to old files?
Previous Message Joe Carr 2010-09-28 18:19:13 Re: FATAL: the database system is starting up