Re: Setting BLCKSZ 4kB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, sanyam jain <sanyamjain22(at)live(dot)in>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting BLCKSZ 4kB
Date: 2018-01-26 23:37:46
Message-ID: 20180126233746.pwmsn42d4qfweptu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-01-27 00:28:07 +0100, Tomas Vondra wrote:
> But does that make the internal page size relevant to the atomicity
> question? For example, let's say we write 4kB on a drive with 2kB
> internal pages, and the power goes out after writing the first 2kB of
> data (so losing the second 2kB get lost). The disk however never
> confirmed the 4kB write, exactly because of the writer barrier ...

That would be problematic, yes. That's *precisely* the torn page issue
we're worried about re full page writes. Consider, as just one of many
examples, crashing during WAL apply, the first half of the page might be
new, the other old - we'd skip the next time we try apply because the
LSN in the page would indicate it's new enough. With FPWs that doesn't
happen because the first time through we'll reapply the whole write.

> I have to admit I'm not sure what happens at this point - whether the
> drive will produce torn page (with the first 2kB updated and 2kB old),
> or if it's smart enough to realize the write barrier was not reached.

I don't think you can rely on anything.

> But perhaps this (non-volatile write cache) is one of the requirements
> for disabling full page writes?

I don't think that's reliably doable due to the limited knowledge about
what exactly happens inside each and every model of drive.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-01-26 23:40:59 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Tom Lane 2018-01-26 23:36:51 Re: [HACKERS] Refactoring identifier checks to consistently use strcmp