On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2011-12-22 09:42, Florian Weimer wrote:
>> * David Fetter:
>>> The issue is that double writes needs a checksum to work by itself,
>>> and page checksums more broadly work better when there are double
>>> writes, obviating the need to have full_page_writes on.
>> How desirable is it to disable full_page_writes? Doesn't it cut down
>> recovery time significantly because it avoids read-modify-write cycles
>> with a cold cache
> What is the downsides of having full_page_writes enabled .. except from
> log-volume? The manual mentions something about speed, but it is
> a bit unclear where that would come from, since the full pages must
> be somewhere in memory when being worked on anyway,.
I thought I will share some of my perspective on this checksum +
doublewrite from a performance point of view.
Currently from what I see in our tests based on dbt2, DVDStore, etc
is that checksum does not impact scalability or total throughput
measured. It does increase CPU cycles depending on the algorithm used
by not really anything that causes problems. The Doublewrite change
will be the big win to performance compared to full_page_write. For
example compared to other databases our WAL traffic is one of the
highest. Most of it is attributed to full_page_write. The reason
full_page_write is necessary in production (atleast without worrying
about replication impact) is that if a write fails, we can recover
that whole page from WAL Logs as it is and just put it back out there.
(In fact I believe thats the recovery does.) However the net impact is
during high OLTP the runtime impact on WAL is high due to the high
traffic and compared to other databases due to the higher traffic, the
utilization is high. Also this has a huge impact on transaction
response time the first time a page is changed which in all OLTP
environments it is huge because by nature the transactions are all on
When we use Doublewrite with checksums, we can safely disable
full_page_write causing a HUGE reduction to the WAL traffic without
loss of reliatbility due to a write fault since there are two writes
always. (Implementation detail discussable). Since the double writes
itself are sequential bundling multiple such writes further reduces
the write time. The biggest improvement is that now these writes are
not done during TRANSACTION COMMIT but during CHECKPOINT WRITES which
improves performance drastically for OLTP application's transaction
performance and you still get the reliability that is needed.
Typically Performance in terms of throughput tps system is like
tps(Full_page Write) << tps (no full page write)
With the double write and CRC we see
tps (Full_page_write) << tps (Doublewrite) < tps(no full page
Which is a big win for production systems to get the reliability of
Also the side effect for response times is that they are more leveled
unlike full page write where the response time varies like 0.5ms to
5ms depending on whether the same transaction needs to write a full
page onto WAL or not. With doublewrite it can always be around 0.5ms
rather than have a huge deviation on transaction performance. With
this folks measuring the 90 %ile response time will see a huge relief
on trying to meet their SLAs.
Also from WAL perspective, I like to put the WAL on its own
LUN/spindle/VMDK etc .. The net result that I have with the reduced
WAL traffic, my utilization drops which means the same hardware can
now handle higher WAL traffic in terms of IOPS resulting that WAL
itself becomes lesser of a bottleneck. Typically this is observed by
the reduction in response times of the transactions and increase in
tps till some other bottleneck becomes the gating factor.
So overall this is a big win.
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2011-12-22 15:44:48|
|Subject: Re: Typed hstore proposal |
|Previous:||From: Kevin Grittner||Date: 2011-12-22 09:50:33|
|Subject: Re: Page Checksums + Double Writes|