> After thinking about this a little, I believe I see why Vadim did it
> the way he did. Suppose we tried to make the code sequence be
> obtain write lock on buffer;
> XLogOriginalPage(buffer); // copy page to xlog if first since ckpt
> modify buffer;
> XLogInsert(xlog entry for modification);
> mark buffer dirty and release write lock;
> so that the saving of the original page is a separate xlog entry from
> the modification data. Looks easy, and it'd sure simplify XLogInsert
> a lot. The only problem is it's wrong. What if a checkpoint occurs
> between the two XLOG records?
> The decision whether to log the whole buffer has to be atomic with the
> actual entry of the xlog record. Unless we want to hold the xlog insert
> lock for the entire time that we're (eg) splitting a btree page, that
> means we log the buffer after the modification work is done, not before.
Yes, I see. Can't currently come up with a workaround eighter. Hmm ..
Duplicating the buffer is probably not a workable solution.
I do not however see how the current solution fixes the original problem,
that we don't have a rollback for index modifications.
The index would potentially point to an empty heaptuple slot.
When this slot, because marked empty is reused after startup, the index points
to the wrong record.
Unless of course startup rollforward visits all heap pages pointed at
by index xlog records and inserts a tuple into heap marked deleted.
Additionally I do not see how this all works for userland index types.
In short I do not think that the current implementation of "physical log" does
what it was intended to do :-(
pgsql-hackers by date
|Next:||From: Lamar Owen||Date: 2001-03-06 17:55:54|
|Subject: Re: How to shoot yourself in the foot: kill -9 postmaster|
|Previous:||From: Tom Lane||Date: 2001-03-06 17:31:30|
|Subject: Re: AW: AW: AW: WAL-based allocation of XIDs is insecure |