Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date: 2014-10-08 13:00:42
Message-ID: CAB7nPqTOGD6QGeTXrd8EGyMLAy83c49shHZzaDoQOeFV+ZPaFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
wrote:

> 1. Where do the FF files come from? In 9.2, FF-segments are not supposed
> to created, ever.
>
Since this only happens with streaming replication, the FF segments are
> probably being created by walreceiver. XLogWalRcvWrite is the function that
> opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
> opens the file corresponding the start position in the message received
> from the master. There is no check that the start position is valid,
> though; if the master sends a start position in the FF segment, walreceiver
> will merrily write it. So the problem could be in the walsender side.
> However, I don't see anything wrong there either.
>
Good to hear that. By looking at the wal receiver and sender code paths, I
found nothing really weird.

I think we should add a check in walreceiver, to throw an error if the
> master sends an invalid WAL pointer, pointing to an FF segment.
>
Then we're good for a check in ProcessWalSndrMessage for walEnd I guess.
Seems like a straight-forward patch.

2. Why are the .done files sometimes not being created?
>
> I may have an explanation for that. Walreceiver creates a .done file when
> it closes an old segment and opens a new one. However, it does this only
> when it's about to start writing to the new segment, and still has the old
> segment open. If you stream the FE segment fully, but drop replication
> connection at exactly that point, the .done file is not created. That might
> sound unlikely, but it's actually pretty easy to trigger. Just do "select
> pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
> restart.
>

That's exactly the test I have been doing a couple of times to trigger this
behavior before sending my previous email, but without success on the
standby with master: all the WAL files were marked as .done. Now, I have
just retried it, with far more tries on REL9_3_STABLE and on HEAD and I
have been able to actually trigger the problem a couple of times. Simply
run a long transaction generating a lot of WAL like that:
=# create table aa as select generate_series(1,1000000000);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
And with enough "luck", .ready files may appear. It may take a dozen of
tries before seeing at least ones. And I noticed that generally multiple
.ready files appear at the same time.

> The creation of the .done files seems quite unreliable anyway. If only a
> portion of a segment is streamed, we don't write a .done file for it, so we
> still have the original problem that we will try to archive the segment
> after failover, even though the master might already have archived it.
>
Yep. Agreed.

> I looked again at the thread where this was discussed:
> http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+AvFbVHLGoypO9G_QDKbabeXgXBVGd05g(at)mail(dot)gmail(dot)com(dot)
> I believe the idea was that the server that generates a WAL segment is
> always responsible for archiving it. A standby should never attempt to
> archive a WAL segment that was restored from the archive, or streamed from
> the master.
>
In that thread, it was not discussed what should happen to WAL files that
> an admin manually copies into pg_xlog of the standby. Should the standby
> archive them? I don't think so - the admin should copy them manually to the
> archive too, if he wants them archived. It's a good and simple rule that
> the server that generates the WAL, archives the WAL.
>

Question time: why has the check based on recovery state of the node been
removed in 1bd42cd? Just assuming, but did you have in mind that relying on
XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at
this point?

Instead of creating any .done files during recovery, we could scan pg_xlog
> at promotion, and create a .done file for every WAL segment that's present
> at that point. That would be more robust. And then apply your patch, to
> recycle old segments during archive recovery, ignoring .done files.
>

The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Regards,
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-10-08 13:25:43 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Andres Freund 2014-10-08 13:00:13 Re: Wait free LW_SHARED acquisition - v0.2