Re: Reduction in WAL for UPDATEs

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reduction in WAL for UPDATEs
Date: 2007-03-28 12:54:04
Message-ID: 20070328125404.GC8214@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 28, 2007 at 08:07:14AM +0100, Simon Riggs wrote:
> On Tue, 2007-03-27 at 20:48 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > It seems possible to reduce overall WAL volume by roughly 25% on common
> > > workloads by optimising the way UPDATE statements generate WAL.
> >
> > This seems a huge amount of work to optimize *one* benchmark.
>
> Please don't beat me with that. I wouldn't suggest it if I didn't think
> it would help real users. The analysis of the WAL volume was done using
> a benchmark, but only as a guide to indicate likely usage patterns.
> There aren't many real world heavy UPDATE scenarios to analyze right now
> because people have previously actively avoided such usage.
>
> > If it
> > weren't so narrowly focused on the properties of a particular benchmark
> > (mostly UPDATE, mostly a few columns in wide tuples), I'd get more
> > excited.
>
> Updating the current balance on a Customer Account is one of the main
> focus areas for HOT. Those records are typically at least 250 bytes
> long, so we can save ~200 bytes of WAL per UPDATE for the most frequent
> types of UPDATE. Sure, not all UPDATEs would be optimised, but then they
> are much less frequent.
>
> As I mentioned, the WAL volume is disproportionately generated by
> UPDATEs of longer rows, so optimising WAL for just a few tables can make
> a big difference to the overall volume.
>
> > The extra time spent holding exclusive lock on the page
> > doesn't sound attractive either ...
>
> Agreed, thats why I set a fairly high bar for when this would kick in.
> The fewer rows on a page, the less contention.
>
We use DSPAM as one of our anti-spam options. Its UPDATE pattern is to
increment a spam counter or a not-spam counter while keeping the user and
token information the same. This would benefit from this optimization.
Currently we are forced to use MySQL with MyISM tables to support the
update load, although PostgreSQL 8.2 performance is right at the I/O
break-even point for switching databases. With HOT and more optimized
UPDATE I/O, 8.3 would give us enough I/O headroom to switch to PostgreSQL.

Ken Marshall

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-03-28 13:33:50 Re: Log levels for checkpoint/bgwriter monitoring
Previous Message Gregory Stark 2007-03-28 10:53:10 Re: Patch queue concern