Quick Links

RE: beta testing version

From:	"Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
To:	"'Zeugswetter Andreas SB'" <ZeugswetterA(at)wien(dot)spardat(dot)at>, "'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: beta testing version
Date:	2000-12-07 03:50:42
Message-ID:	8F4C99C66D04D4118F580090272A7A234D31D4@sectorbase1.sectorbase.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> > > > Sounds great! We can follow this way: when first after last
> > > > checkpoint update to a page being logged, XLOG code can log
> > > > not AM specific update record but entire page (creating backup
> > > > "physical log"). During after crash recovery such pages will
> > > > be redone first, ensuring page consistency for further redo ops.
> > > > This means bigger log, of course.
> > >
> > > Be sure to include a CRC of each part of the block that you hope
> > > to replay individually.
> >
> > Why should we do this? I'm not going to replay parts individually,
> > I'm going to write entire pages to OS cache and than apply
> > changes to them. Recovery is considered as succeeded after server
> > is ensured that all applyed changes are on the disk. In the case of
> > crash during recovery we'll replay entire game.
>
> Yes, but there would need to be a way to verify the last page
> or record from txlog when running on crap hardware. The point was,
> that crap hardware writes our 8k pages in any order (e.g. 512 bytes
> from the end, then 512 bytes from front ...), and does not
> even notice, that it only wrote part of one such 512 byte block when
> reading it back after a crash. But, I actually doubt that this is
> true for all but the most crappy hardware.

Oh, I didn't consider log consistency that time. Anyway we need in CRC
for entire log record not for its 512-bytes parts.

Well, I didn't care about not atomic 8K-block writes in current WAL
implementation - we never were protected from this: backend inserts
tuple, but only line pointers go to disk => new lp points on some
garbade inside unupdated page content. Yes, transaction was not
committed but who knows content of this garbade and what we'll get
from scan trying to read it. Same for index pages.

Can we come to agreement about CRC in log records? Probably it's
not too late to add it (initdb).

Seeing bad CRC recovery procedure will assume that current record
(and all others after it, if any) is garbade - ie comes from
interrupted disk write - and may be ignored (backend writes data
pages only after changes are logged - if changes weren't
successfully logged then on-disk image of data pages was not
updated and we are not interested in log records).

This may be implemented very fast (if someone points me where
I can find CRC func). And I could implement "physical log"
till next monday.

Comments?

Vadim

Responses

Re: beta testing version at 2000-12-07 04:26:11 from Tom Lane
CRC was: Re: beta testing version at 2000-12-07 07:40:49 from Horst Herb

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2000-12-07 04:26:11	Re: beta testing version
Previous Message	mlw	2000-12-07 03:09:50	HeapTuple?