Re: postgres on a non-journaling filesystem

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: maayan mordehai <maayanmordehai3(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: postgres on a non-journaling filesystem
Date: 2019-01-23 16:10:47
Message-ID: 20190123161047.boyznas7u7kkybld@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-01-23 14:20:52 +0200, Heikki Linnakangas wrote:
> On 23/01/2019 01:03, maayan mordehai wrote:
> > hello,
> >
> > I'm Maayan, I'm in a DBA team that uses postgresql.
> > I saw in the documentation on wals:
> > https://www.postgresql.org/docs/10/wal-intro.html
> > In the tip box that, it's better not to use a journaling filesystem. and I
> > wanted to ask how it works?
> > can't we get corruption that we can't recover from?
> > I mean what if postgres in the middle of a write to a wal and there is a
> > crash, and it didn't finish.
> > I'm assuming it will detect it when we will start postgres and write that
> > it was rolled back, am I right?
>
> Yep, any half-written transactions will be rolled back.
>
> > and how does it work in the data level? if some of the 8k block is written
> > but not all of it, and then there is a crash, how postgres deals with it?
>
> The first time a block is modified after a checkpoint, a copy of the block
> is written to the WAL. At crash recovery, the block is restored from the
> WAL. This mechanism is called "full page writes".
>
> The WAL works just like the journal in a journaling filesystem. That's why
> it's not necessary to have journaling at the filesystem level.

But note not having journaling on the FS level often makes OS start
after a crash *painfully* slow, because fsck or similar will be run. And
that's often necessary for the internal FS consistency.

Note that even with journaling enabled, most filesystem by default don't
journal data, so you can get those partial writes anyway.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-23 16:45:03 Re: Typo: llvm*.cpp files identified as llvm*.c
Previous Message John Naylor 2019-01-23 15:48:28 Re: WIP: Avoid creation of the free space map for small tables