Re: increasing the default WAL segment size

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Beena Emerson <memissemerson(at)gmail(dot)com>
Subject: Re: increasing the default WAL segment size
Date: 2017-01-03 15:44:50
Message-ID: CA+TgmoYJJei4b_rifB2uMLEcfs5W6UPWGVE-gBsRsUwMSiqtiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 8:59 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 3 January 2017 at 13:45, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 2 January 2017 at 21:23, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>>>
>>>> It's not clear from the thread that there is consensus that this feature is desired. In particular, the performance aspects of changing segment size from a C constant to a variable are in question. Someone with access to large hardware should test that. Andres[1] and Robert[2] did suggest that the option could be changed to a bitshift, which IMHO would also solve some sanity-checking issues.
>>>
>>> Overall, Robert has made a good case. The only discussion now is about
>>> the knock-on effects it causes.
>>>
>>> One concern that has only barely been discussed is the effect of
>>> zero-ing new WAL files. That is a linear effect and will adversely
>>> effect performance as WAL segment size increases.
>>>
>>
>> Sorry, but I am not able to understand why this is a problem? The
>> bigger the size of WAL segment, lesser the number of files. So IIUC,
>> then it can only impact if zero-ing two 16MB files is cheaper than
>> zero-ing one 32MB file. Is that your theory or you have something
>> else in mind?
>
> The issue I see is that at present no backend needs to do more than
> 16MB of zeroing at one time, so the impact on response time is
> reduced. If we start doing zeroing in larger chunks than the impact on
> response times will increase. So instead of regular blips we have one
> large blip, less often. I think the latter will be worse, but welcome
> measurements that show that performance is smooth and regular with
> large files sizes.

Yeah. I don't think there's any way to get around the fact that there
will be bigger latency spikes in some cases with larger WAL files. I
think the question is whether they'll be common enough or serious
enough to worry about. For example, in a quick test on my laptop,
zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k
count=2048" takes about 11 milliseconds, and zero-filling a 64
megabyte file with a count of 8192 increases the time to almost 50
milliseconds. That's something, but I wouldn't rate it as concerning.
There are a lot of things that can cause latency changes multiple
orders of magnitude larger than that, so worrying about that one in
particular would seem to me to be fairly pointless. However, that's
also a measurement on an unloaded system with an SSD, and the impact
may be a lot more on a big system where with lots of concurrent
activity, and if the process that does the write also has to do an
fsync, that will increase the cost considerably, too.

But the flip side is that it's wrong to imagine that there's no harm
in leaving the situation as it is. Even my MacBook Pro can crank out
about 2.7 WAL segments/second on "pgbench -c 16 -j 16 -T 60". I think
a decent server with a few more CPU cores than my laptop has could do
4-5 times that. So we shouldn't imagine that the costs of spewing out
a bajillion segment files are being paid only at the very high end.
Even somebody running PostgreSQL on a low-end virtual machine might
find it difficult to write an archive_command that can keep up if the
system is under continuous load. Of course, as Stephen pointed out,
there are toolkits that can do it and you should probably be using one
of those anyway for other reasons, but nevertheless spitting out
almost 3 WAL segments per second even on a laptop gives a whole new
meaning to the term "continuous archiving".

Another point to consider is that a bigger WAL segment size can
actually *improve* latency because every segment switch triggers an
immediate fsync, and every backend in the system ends up waiting for
it to finish. We should probably eventually try to push those flushes
into the background, and the zeroing work as well. My impression
(possibly incorrect?) is that we expect to settle into a routine where
zeroing new segments is relatively uncommon because we reuse old
segment files, but the forced end-of-segment flushes never go away.
So it's possible we might actually come out ahead on latency with this
change, at least sometimes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-03 15:45:07 Re: [PATCH] Reload SSL certificates on SIGHUP
Previous Message Stephen Frost 2017-01-03 15:37:08 Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal