On Thu, 2006-03-30 at 13:51 -0500, Tom Lane wrote:
> This is pretty much what heapam and btree currently do, but on looking
> at it I think it's got a problem: we really ought to mark the buffer
> dirty before releasing the critical section. Otherwise, if there's an
> elog(ERROR) before the WriteBuffer call is reached, the backend would go
> on about its business, and we'd have changes in a disk buffer that isn't
> marked dirty. The changes would be uncommitted, presumably, because of
> the error --- but nonetheless this could result in inconsistency down
> the road. One example scenario is:
> 1. We insert a tuple at, say, index 3 on a page.
> 2. elog after making the XLOG entry, but before WriteBuffer.
> 3. page is later discarded from shared buffers; since it's not
> marked dirty, it'll just be dropped without writing it.
> 4. Later we need to insert another tuple in same table, and
> we again choose index 3 on this page as the place to put it.
> 5. system crash leads to replay from WAL.
> Now we'll have two different WAL records trying to insert tuple 3.
> Not good.
Problem for indexes only. heap xlrecs don't specify exact insert points
so they'd replay just fine even if they were not originally inserted
> I'm thinking we should change the code and the README to specify that
> you must mark the buffer dirty before you can END_CRIT_SECTION().
Should we just do this for indexes only? (Or any structure that requires
an exact physical position to be recorded in WAL).
Accesses to local buffers don't need to be critical sections either.
Best Regards, Simon Riggs
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2006-03-31 14:36:20|
|Subject: Re: WAL dirty-buffer management bug |
|Previous:||From: Qingqing Zhou||Date: 2006-03-31 09:19:11|
|Subject: Re: pg_class catalog question...|