Re: Problem while setting the fpw with SIGHUP

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, hlinnaka <hlinnaka(at)iki(dot)fi>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Problem while setting the fpw with SIGHUP
Date: 2018-09-18 06:34:57
Message-ID: 20180918063457.GL31460@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Sep 14, 2018 at 04:30:37PM +0530, Amit Kapila wrote:
> On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> So, I have been working on this problem again and I have reviewed the
>> thread, and there have been many things discussed in the last couple of
>> months:
>> 1) We do not want to initialize XLogInsert stuff unconditionally for all
>> processes at the moment recovery begins, but we just want to initialize
>> it once WAL write is open for business.
>> 2) Both the checkpointer and the startup process can call
>> UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
>> incorrect values.
>
> Can you share the steps to reproduce this problem?

This refers to the first problem reported on this thread:
https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com

In order to reproduce the problem, you can for example stop the server
in immediate mode. Then attach a debugger to it and add a breakpoint to
UpdateFullPageWrites. You can check that XLOG insert has not been
initialized yet by looking at xloginsert_cxt ot ThisTimeLineID. On a
second session, switch full_page_writes to on or off, reload the
parameters and then trigger a checkpoint. The important point is to
trigger an inconsistency between XLogCtl->Insert->fullPageWrites and
the value of fullPageWrites within the checkpointer context. With the
checkpoint triggered, the debugger will stop at UpdateFullPageWrites
immediately. At this point, you can simply check if fullPageWrites
Insert->fullPageWrites have the same value or a different one. If the
values match, simply switch full_page_writes and reload again, with the
checkpointer still waiting at the beginning of UpdateFullPageWrites.
SIGHUP will make the checkpointer process hang a bit, and then it will
move on. At this point you will be able to see the failure:
TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731)
2018-09-18 15:06:39 JST [7396]: [11-1] db=,user=,app=,client= LOG:
checkpointer process (PID 7399) was terminated by signal 6: Aborted

> On a regular startup when there is no recovery, it won't allow us to
> log the WAL record (XLOG_FPW_CHANGE) which can happen without above
> change. You can check that by setting full_page_writes=off and start
> the system.

Oh, good point, InRecovery is set to false in this case so that would be
skipped. We can simply fix that by adding a flag, say "force" to
UpdateFullPageWrites to allow a process to enforce the update of FPW
even if RecoveryInProgress returns true, which would be the case for the
startup process.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-09-18 06:39:37 Re: Problem while setting the fpw with SIGHUP
Previous Message Michael Paquier 2018-09-18 05:51:42 Re: Cache lookup errors with functions manipulation object addresses