Re: Idea for improving buildfarm robustness

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Idea for improving buildfarm robustness
Date: 2015-09-30 13:59:18
Message-ID: 3196.1443621558@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
> Ouch. So it sounds like there's value to seeing if pg_control isn't what
> we expect it to be.

> Instead of looking at the inode (portability problem), what if
> pg_control contained a random number that was created at initdb time? On
> startup postmaster would read that value and then if it ever changed
> after that you'd know something just went wrong.

> Perhaps even stronger would be to write a new random value on startup;
> that way you'd know if an old copy accidentally got put in place.

Or maybe better than an otherwise-useless random value, write the
postmaster's start time.

But none of these would offer that much added safety IMV. If you don't
restart the postmaster very often, it's not unlikely that what you were
trying to restore is a backup from earlier in the current postmaster's
run. Another problem with checking the contents of pg_control, rather
than only its presence, is that the checkpointer will overwrite it every
so often, and thereby put back whatever we were expecting to find there.
If the postmaster's recheck interval is significantly less than the
checkpoint interval, then you'll *probably* notice before the evidence
vanishes, but it's hardly guaranteed.

It strikes me that a different approach that might be of value would
be to re-read postmaster.pid and make sure that (a) it's still there
and (b) it still contains the current postmaster's PID. This would
be morally equivalent to what Jim suggests above, and it would dodge
the checkpointer-destroys-the-evidence problem, and it would have the
additional advantage that we'd notice when a brain-dead DBA decides
to manually remove postmaster.pid so he can start a new postmaster.
(It's probably far too late to avoid data corruption at that point,
but better late than never.)

This is still not bulletproof against all overwrite-with-a-backup
scenarios, but it seems like a noticeable improvement over what we
discussed yesterday.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-30 14:06:47 Re: [PATCH] postgres_fdw extension support
Previous Message Paul Ramsey 2015-09-30 13:53:06 Re: [PATCH] postgres_fdw extension support