Re: Idea for getting rid of VACUUM FREEZE on cold pages

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date: 2010-06-08 12:58:02
Message-ID: 20100608125802.GM21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Greg Stark (gsstark(at)mit(dot)edu) wrote:
> I assume you mean back out the changes incrementally until you find a
> full_page_write and see if it matches?

To be honest, you're already assuming I know more about how this all
works than I do. :) The gist of my thought was simply- we write out
block changes to the WAL, including data in many cases. If we were to
look at the very end of the WAL, at the last piece of data written
there, and the data files have supposedly been flushed, then what's in
the WAL at that point should match what's in the data files, right? If
it doesn't, that'd be bad.

> And continue comparing with
> full_page_writes once per checkpoint?

If we could only do it when there's a full page write, then perhaps that
would work as well, but I thought we tracked them at a lower level. In
any case, the idea is the same- compare what's in WAL to what's supposed
to be on disk, and alarm whenever there's a clear error.

> I don't think the WAL has enough
> information to replay backwards though. For example vacuum cleanup
> records just list the tids to remove. They don't have the contents to
> replace there.

Right, you couldn't actually move the database backwards in time using
this tool (because we only write out new data, we don't write out what
was in that block/page before the write)- that isn't the idea or intent.
It would just be a tool that someone could run against a database where
they've detected corruption (or, I dunno, more frequently, to perhaps
catch corruption faster?), to see if the problem is a PG bug or a
hardware/kernel/etc issue. In fact, if you could somehow do this
against a backup that's been taken using pg_start_backup/pg_stop_backup,
that would be pretty awesome.

I know that if such a tool existed, I'd be happy to run it as part of my
regular backup routines- I *always* have all the WALs from my last
backup to my next backup (and typically farther back than that, eg: if I
run full backups weekly, I'll have 4 full backups + all 4 weeks of WALs,
to be able to replay back to any point in the month..).

The big question that I have is- would this actually be productive?
Would it actually be able to catch hardware corruption or help at all
with PG bugs? Those are the things I'm not really sure about.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Amiel 2010-06-08 13:26:25 3rd time is a charm.....right sibling is not next child crash.
Previous Message Greg Stark 2010-06-08 12:53:39 Re: Parameters of GiST indexes