Re: WAL Re-Writes

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL Re-Writes
Date: 2016-02-08 14:46:40
Message-ID: 20160208144639.r425hm2gbbi7w7gi@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-02-08 10:38:55 +0530, Amit Kapila wrote:
> I think deciding it automatically without user require to configure it,
> certainly has merits, but what about some cases where user can get
> benefits by configuring themselves like the cases where we use
> PG_O_DIRECT flag for WAL (with o_direct, it will by bypass OS
> buffers and won't cause misaligned writes even for smaller chunk sizes
> like 512 bytes or so). Some googling [1] reveals that other databases
> also provides user with option to configure wal block/chunk size (as
> BLOCKSIZE), although they seem to decide chunk size based on
> disk-sector size.

FWIW, you usually can't do that small writes with O_DIRECT. Usually it
has to be 4KB (pagesize) sized, aligned (4kb again) writes. And on
filesystems that do support doing such writes, they essentially fall
back to doing buffered IO.

> An additional thought, which is not necessarily related to this patch is,
> if user chooses and or we decide to write in 512 bytes sized chunks,
> which is usually a disk sector size, then can't we think of avoiding
> CRC for each record for such cases, because each WAL write in
> it-self will be atomic. While reading, if we process in wal-chunk-sized
> units, then I think it should be possible to detect end-of-wal based
> on data read.

O_DIRECT doesn't give any useful guarantees to do something like the
above. It doesn't have any ordering or durability implications. You
still need to do fdatasyncs and such.

Besides, with the new CRC implications, that doesn't really seem like
such a large win anyway.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-02-08 14:54:16 Re: checkpointer continuous flushing - V16
Previous Message David Rowley 2016-02-08 14:43:18 Re: WIP: Make timestamptz_out less slow.