Re: patch to allow disable of WAL recycling

From: Jerry Jelinek <jerry(dot)jelinek(at)joyent(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch to allow disable of WAL recycling
Date: 2019-03-07 23:35:09
Message-ID: CACPQ5FqJPtzR5qB0mvuAByFTKFFi9ORoKS9aX6jONcS9qo9nsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas,

Responses in-line.

On Thu, Mar 7, 2019 at 3:09 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> On Fri, Mar 8, 2019 at 10:13 AM Jerry Jelinek <jerry(dot)jelinek(at)joyent(dot)com>
> wrote:
> > I have attached a new version of the patch that implements the changes
> we've discussed over the past couple of days. Let me know if there are any
> comments or suggestions.
>
> + fail = lseek(fd, wal_segment_size - 1, SEEK_SET) < (off_t) 0 ||
> + (int) write(fd, zbuffer.data, 1) != (int) 1;
>
> BTW we now have pg_pwrite() to do this in one syscall.
>

Thanks for the pointer, I'll take a look at that.

>
> + Disabling this option prevents zero-filling new WAL files.
> + This parameter should only be set to <literal>off</literal>
> when the WAL
> + resides on a <firstterm>Copy-On-Write</firstterm>
> (<acronym>COW</acronym>)
> + filesystem.
>
> Hmm. The comments in the source give the actual motivation for this
> preallocation logic... I wonder why we don't come out and say the same
> thing in the documentation, instead of this vague language about COW
> filesystems.
>
> Here's a suggestion: "Zero-filling new segment files ensures that it
> is safe to use wal_sync_method = fdatasync or wal_sync_method =
> open_datasync on filesystems that synchronize file meta-data and data
> separately. It is not necessary on some filesystems such as ZFS."
>
> My understanding is that it's not really the COW-ness that makes it
> not necessary, it's the fact that fdatasync() doesn't do anything
> different from fsync() on ZFS and there is no situation where
> fdatasync() succeeds, you lose power, you come back up and find that
> the file size is wrong or a hole in the middle of the file has come
> back from the dead, and you lost the data. The whole concept of "data
> sync" implies that file meta-data and file contents are cached and
> synchronized separately and you can deliberately ask for weaker
> coherency to cut down on IOs; *that's* the thing that ZFS doesn't
> have, and couldn't benefit from because it's just going to write stuff
> in its tidy sequential log in the form of all-or-nothing transactions
> anyway. I don't know if that's also true for eg BTRFS or any other
> COW filesystem that might be out there, but I don't know why you'd
> want to mention COW instead of wal_sync_mode as the motivation when
> the source code comment know better.
>

Hopefully I am not misinterpreting your comment here, but I'm not sure I
fully agree with that assessment. I can't speak for other filesystems, but
for ZFS, none of the zero-filled blocks will actually be written to disk,
but that determination happens fairly late in the process, after the
thousands of write system calls have been processed. So on ZFS, these
writes are basically useless, except as a way to increase the size of the
file. No disk space is actually allocated. However, on any COW filesystem,
any write to any of these zero-filled blocks will have to allocate a new
block, so nothing about "preallocating space" has been accomplished by all
of these system calls. At least, preallocating space is my understanding of
why the zero-fill is currently being performed.

> + Disabling this option prevents WAL file recycling.
> + This parameter should only be set to <literal>off</literal>
> when the WAL
> + resides on a COW filesystem.
>
> Would it be better to say what it's for, rather than when to set it?
> To make clear that it's a performance setting, not a safety one.
> "Setting this option to <literal>off</literal> may increase
> performance on copy-on-write filesystems."
>

That sounds good to me, I'll change the wording and post a new patch after
I wait a little while to see if there is any other feedback.

Thanks for taking a look,
Jerry

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-03-08 00:36:07 Re: Update does not move row across foreign partitions in v11
Previous Message Grigory Smolkin 2019-03-07 22:38:29 Re: [PROPOSAL] Drop orphan temp tables in single-mode