From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message |
Date: | 2022-02-02 14:46:35 |
Message-ID: | CALj2ACVVMAGDvourutDvtDUOhTqS_yTtfrdCjtP75-RrQU0dFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code?
>
> ereport((log_checkpoints ? LOG : DEBUG2),
> (errmsg("recovery restart point at %X/%X",
> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
> xtime ? errdetail("Last completed transaction was at log time %s.",
> timestamptz_to_str(xtime)) : 0));
>
> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?
Do we ever reach CreateRestartPoint when ControlFile->stat !=
DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
regression tests.
Here's what can happen:
lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is
105, so, "skipping restartpoint, already performed at %X/%X"
LogCheckpointEnd isn't reached
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control
file gets updated and LogCheckpointEnd prints the right redo lsn.
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file
doesn't get updated and LogCheckpointEnd just prints the control redo
lsn. Looks like this case is rare given Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests.
I think we should just let LogCheckpointEnd print the redo lsn from
the control file. We can just remove the above errmsg("recovery
restart point at %X/%X" message altogether or just print it only in
the rare scenario, something like below:
if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY)
{
ereport((log_checkpoints ? LOG : DEBUG2),
(errmsg("performed recovery restart point at %X/%X while
the database state is %s",
LSN_FORMAT_ARGS(lastCheckPoint.redo)),
getDBState(ControlFile->state)));
}
And the last commit/abort records's timestamp will always get logged
even before we reach here in the main redo loop (errmsg("last
completed transaction was at log time %s").
Or another way is to just pass the redo lsn to LogCheckpointEnd and
pass the lastCheckPoint.redo in if (ControlFile->state !=
DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but
restart point happened.
Thoughts?
Regards,
Bharath Rupireddy.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2022-02-02 15:01:14 | Re: Ensure that STDERR is empty during connect_ok |
Previous Message | Daniel Gustafsson | 2022-02-02 14:40:39 | Ensure that STDERR is empty during connect_ok |