Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date: 2014-10-20 08:15:20
Message-ID: CAB7nPqT_tYoN2p4cDKVnysqg4RqZHwfM4YccAusPG7BWQbB=NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 17, 2014 at 10:12 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> > On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>
> >>
> >> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com>
> >> wrote:
> >>>
> >>> The additional process at promotion sounds like a good idea, I'll try
> to
> >>> get a patch done tomorrow. This would result as well in removing the
> >>> XLogArchiveForceDone stuff. Either way, not that I have been able to
> >>> reproduce the problem manually, things can be clearly solved.
> >>
> >> Please find attached two patches aimed to fix this issue and to improve
> the
> >> situation:
> >> - 0001 prevents the apparition of those phantom WAL segment file by
> ensuring
> >> that when a node is in recovery it will remove it whatever its status in
> >> archive_status. This patch is the real fix, and should be applied down
> to
> >> 9.2.
> >> - 0002 is a patch implementing Heikki's idea of enforcing all the
> segment
> >> files present in pg_xlog to have their status to .done, marking them for
> >> removal. When looking at the code, I finally concluded that Fujii-san's
> >> point, about marking in all cases as .done segment files that have been
> >> fully streamed, actually makes more sense to not be backward. This patch
> >> would actually not be mandatory for back-patching, but it makes the
> process
> >> more robust IMO.
> >
> > Thanks for the patches!
>
> While reviewing the patch, I found another bug related to WAL file in
> recovery
> mode. The problem is that exitArchiveRecovery() always creates .ready file
> for
> the last WAL file of the old timeline even when it's restored from the
> archive
> and has .done file. So this causes the already-archived WAL file to be
> archived
> again.... Attached patch fixes this bug.
>
That's a good catch! Patch looks good. I think that you should change
xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery.
This is only for correctness, so that's a master-only remark, because this
variable is used only to calculate a segment file name, and not a path.
Renaming the variable from xlogpath to xlogname would make sense as well.
Regards,
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2014-10-20 13:09:00 Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD
Previous Message Michael Paquier 2014-10-20 07:51:38 Re: agent_init concurrent running question