Re: WAL Re-Writes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL Re-Writes
Date: 2016-02-03 13:42:41
Message-ID: CA+TgmoZiR-GfG5jNaaXsg4y9pft9OKgh9LMQBMM3SkdDD8=UxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 3, 2016 at 7:28 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On further testing, it has been observed that misaligned writes could
> cause reads even when blocks related to file are not in-memory, so
> I think what Jan is describing is right. The case where there is
> absolutely zero chance of reads is when we write in OS-page boundary
> which is generally 4K. However I still think it is okay to provide an
> option for WAL writing in smaller chunks (512 bytes , 1024 bytes, etc)
> for the cases when these are beneficial like when wal_level is
> greater than equal to Archive and keep default as OS-page size if
> the same is smaller than 8K.

Hmm, a little research seems to suggest that 4kB pages are standard on
almost every system we might care about: x86_64, x86, Power, Itanium,
ARMv7. Sparc uses 8kB, though, and a search through the Linux kernel
sources (grep for PAGE_SHIFT) suggests that there are other obscure
architectures that can at least optionally use larger pages, plus a
few that can use smaller ones.

I'd like this to be something that users don't have to configure, and
it seems like that should be possible. We can detect the page size on
non-Windows systems using sysctl(_SC_PAGESIZE), and on Windows by
using GetSystemInfo. And I think it's safe to make this decision at
configure time, because the page size is a function of the hardware
architecture (it seems there are obscure systems that support multiple
page sizes, but I don't care about them particularly). So what I
think we should do is set an XLOG_WRITESZ along with XLOG_BLCKSZ and
set it to the smaller of XLOG_BLCKSZ and the system page size. If we
can't determine the system page size, assume 4kB.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-02-03 14:10:24 Re: [POC] FETCH limited by bytes.
Previous Message Amit Kapila 2016-02-03 12:28:05 Re: WAL Re-Writes