Re: Torn page hazard in ginRedoUpdateMetapage()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Torn page hazard in ginRedoUpdateMetapage()
Date: 2012-05-03 04:16:34
Message-ID: 11920.1336018594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> On Wed, May 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> Can we indeed assume that all support-worthy filesystems align the start of
>> every file to a physical sector? I know little about modern filesystem
>> design, but these references leave me wary of that assumption:
>>
>> http://www.mail-archive.com/linux-btrfs(at)vger(dot)kernel(dot)org/msg14690.html
>> http://en.wikipedia.org/wiki/Block_suballocation
>>
>> If it is a safe assumption, we could exploit it elsewhere.

> Not to say whether this is safe or not, but it *is* exploited
> elsewhere, as I understand it: the pg_control information, whose
> justification for its safety is its small size. That may point to a
> very rare problem with pg_control rather the safety of the assumption
> it makes.

I think it's somewhat common now for filesystems to attempt to optimize
very small files (on the order of a few dozen bytes) in that way. It's
hard to see where's the upside for changing the conventional storage
allocation when the file is sector-sized or larger; the file system does
have to be prepared to rewrite the file on demand, and moving it from
one place to another isn't cheap.

That wikipedia reference argues for doing this type of optimization on
the last partial block of a file, which is entirely irrelevant for our
purposes since we always ask for page-multiples of space. (The fact
that much of that might be useless padding is, I think, unknown to the
filesystem.)

Having said all that, I wasn't really arguing that this was a guaranteed
safe thing for us to rely on; just pointing out that it's quite likely
that the issue hasn't been seen in the field because of this type of
consideration.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2012-05-03 04:24:38 Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Previous Message Robert Haas 2012-05-03 04:11:48 Re: Latch for the WAL writer - further reducing idle wake-ups.