Tom Lane wrote:
> That seems too fragile to me, as I don't find it a stretch at all to
> think that writing the map file might fail --- just think Windows
> antivirus code :-(. Now, once we have written the WAL record for
> the mapfile change, we can't really afford a failure in my approach
> either. But I think a rename() after successfully creating/writing/
> fsync'ing a temp file is a whole lot safer than writing from a standing
My gut feeling is exactly opposite. Creating and renaming a file
involves operations (and permissions) on the directory, while
overwriting a small file is just a simple write(). Especially if you
open() the file before doing anything irreversible.
> The other problem with what you sketch is that it'd require holding the
> mapfile write lock across commit, because we still have to have strict
> serialization of updates.
Why is the strict serialization of updates needed? To avoid overwriting
the file with stale contents in a race condition?
I was thinking that we only store the modified part in the WAL record.
Right after writing commit record, take the lock, read() the map file,
modify it in memory, write() it back, and release lock.
That means that there's no full images of the file in WAL records, which
makes me slightly uncomfortable from a disaster recovery point-of-view,
but we could keep a backup copy of the file in the data directory or
something if that's too scary otherwise.
> Maybe we should forget the
> rename() trick and overwrite the map file in place. I still think it
> needs to be a separate WAL record though. I'm thinking
> * obtain lock
> * open file for read/write
> * read current contents
> * construct modified contents
> * write and sync WAL record
> * write back file through already-opened descriptor
> * fsync
> * release lock
> Not totally clear if this is more or less safe than the rename method;
> but given the assumption that the file is less than one disk block,
> it should be just as atomic as pg_control updates are.
That doesn't solve the problem I was trying to solve, which is that if
the map file is rewritten, but the transaction later aborts, the map
file update has hit the disk already. That's why I wanted to stash it
into the commit record.
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2010-02-01 15:40:37|
|Subject: Re: Review: listagg aggregate|
|Previous:||From: Tom Lane||Date: 2010-02-01 15:27:21|
|Subject: Re: Hot Standby and VACUUM FULL |