Re: "using previous checkpoint record at" maybe not the greatest idea?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "using previous checkpoint record at" maybe not the greatest idea?
Date: 2016-02-07 05:24:51
Message-ID: CAA4eK1+w_TmR5jcT9NqpU5msU4+tZ_12YuTHrmvpyeue8A9dWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 2, 2016 at 5:28 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> currently if, when not in standby mode, we can't read a checkpoint
> record, we automatically fall back to the previous checkpoint, and start
> replay from there.
>
> Doing so without user intervention doesn't actually seem like a good
> idea. While not super likely, it's entirely possible that doing so can
> wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
> tablespace being dropped - going back to the previous checkpoint very
> well could lead to replay not finishing, as the directory to create
> files in doesn't even exist.
>

I think there are similar hazards for deletion of relation when
relfilenode gets reused. Basically, it can delete the data
for one of the newer relations which is created after the
last checkpoint.

> As there's, afaics, really no "legitimate" reasons for needing to go
> back to the previous checkpoint I don't think we should do so in an
> automated fashion.
>

I have tried to find out why at the first place such a mechanism has
been introduced and it seems to me that commit
4d14fe0048cf80052a3ba2053560f8aab1bb1b22 has introduced it, but
the reason is not apparent. Then I digged through the archives
and found mail chain which I think has lead to this commit.
Refer [1][2].

If we want to do something for fallback-to-previous-checkpoint
mechanism, then I think it is worth considering whether we want
to retain xlog files from two checkpoints as that also seems to
have been introduced in the same commit.

> All the cases where I could find logs containing "using previous
> checkpoint record at" were when something else had already gone pretty
> badly wrong. Now that obviously doesn't have a very large significance,
> because in the situations where it "just worked" are unlikely to be
> reported...
>
> Am I missing a reason for doing this by default?
>

I am not sure, but may be such hazards won't exist at the time
fallback-to-previous-checkpoint mechanism has been introduced.

I think even if we want to make it non-default, it will be very
difficult for users to decide whether to turn it on or not. Basically,
I think if such a situation occurs, what ever solution we try to
provide to user, it might not be full-proof, but OTOH we should
provide some way to allow user to start database and dump the
existing contents. Some of the options that comes to mind are
provide some way to get the last checkpoint record from WAL
or provide a way to compute max-lsn from data-pages and use
that with pg_resetxlog utility to allow user to start database.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-02-07 05:32:04 Re: "using previous checkpoint record at" maybe not the greatest idea?
Previous Message Robert Haas 2016-02-07 05:10:34 Re: Patch: fix lock contention for HASHHDR.mutex