From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Moving more work outside WALInsertLock |
Date: | 2011-12-16 03:27:37 |
Message-ID: | 23383.1324006057@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> You missed your cue to discuss leaving the prev link out of the CRC altogether.
> On its own that sounds dangerous, but its not. When we need to confirm
> the prev link we already know what we expect it to be, so CRC-ing it
> is overkill. That isn't true of any other part of the WAL record, so
> the prev link is the only thing we can relax, but thats OK because we
> can CRC check everything else outside of the locked section.
> That isn't my idea, but I'm happy to put it on the table since I'm not shy.
I'm glad it's not your idea, because it's a bad one. A large part of
the point of CRC'ing WAL records is to guard against torn-page problems
in the WAL files, and doing things like that would give up a significant
part of that protection, because there would no longer be any assurance
that the body of a WAL record had anything to do with its prev_link.
Consider a scenario like this:
* We write a WAL record that starts 8 bytes before a sector boundary,
so that the prev_link is in one sector and the rest of the record in
the next one(s).
* Time passes, and we recycle that WAL file.
* We write another WAL record that starts 8 bytes before the same sector
boundary, so that the prev_link is in one sector and the rest of the
record in the next one(s).
* System crashes, after having written out the earlier sector but not
the later one(s).
On restart, the replay code will see a prev_link that matches what it
expects. If the CRC for the remainder of the record is not dependent
on the prev_link, then the remainder of the old record will look good
too, and we'll attempt to replay it, n*16MB too late.
Including the prev_link in the CRC adds a significant amount of
protection against such problems. We should not remove this protection
in the name of shaving a few cycles.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-12-16 04:35:46 | Re: Storing hot members of PGPROC out of the band |
Previous Message | Andrew Dunstan | 2011-12-16 01:16:22 | Re: includeifexists in configuration file |