Re: backup manifests

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-04-02 18:23:46
Message-ID: 20200402182346.6iffoadxu2hsbi2s@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-02 14:16:27 -0400, Robert Haas wrote:
> On Thu, Apr 2, 2020 at 1:23 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I suspect its possible to control the timing by preventing the
> > checkpoint at the end of recovery from completing within a relevant
> > timeframe. I think configuring a large checkpoint_timeout and using a
> > non-fast base backup ought to do the trick. The state can be advanced by
> > separately triggering an immediate checkpoint? Or by changing the
> > checkpoint_timeout?
>
> That might make the window fairly wide on normal systems, but I'm not
> sure about Raspberry Pi BF members or things running
> CLOBBER_CACHE_ALWAYS/RECURSIVELY. I guess I could try it.

You can set checkpoint_timeout to be a day. If that's not enough, well,
then I think we have other problems.

> > FWIW, the only check I'd really like to see in this release is the
> > crosscheck with the files length and the actually read data (to be able
> > to disagnose FS issues).
>
> Not sure I understand this comment. Isn't that a subset of what the
> patch already does? Are you asking for something to be changed?

Yes, I am asking for something to be changed: I'd like the code that
read()s the file when computing the checksum to add up how many bytes
were read, and compare that to the size in the manifest. And if there's
a difference report an error about that, instead of a checksum failure.

I've repeatedly seen filesystem issues lead to to earlier EOFs when
read()ing than what stat() returns. It'll be pretty annoying to have to
debug a general "checksum failure", rather than just knowing that
reading stopped after 100MB of 1GB.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-04-02 18:28:06 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Robert Haas 2020-04-02 18:16:27 Re: backup manifests