Re: Massive table (500M rows) update nightmare

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive table (500M rows) update nightmare
Date: 2010-01-08 06:21:31
Message-ID: dcc563d11001072221g66d3db9cr3556d49b2bdee47e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jan 7, 2010 at 11:14 PM, Carlo Stonebanks
<stonec(dot)register(at)sympatico(dot)ca> wrote:
>> It might well be checkpoints.  Have you tried cranking up checkpoint
>> segments to something like 100 or more and seeing how it behaves then?
>
> No I haven't, althugh it certainly make sense - watching the process run,
> you get this sense that the system occaisionally pauses to take a deep, long
> breath before returning to work frantically ;D
>
> Checkpoint_segments are currently set to 64. The DB is large and is on a
> constant state of receiving single-row updates as multiple ETL and
> refinement processes run continuously.
>
> Would you expect going to 100 or more to make an appreciable difference, or
> should I be more aggressive?

If you're already at 64 then not much. Probably wouldn't hurt to
crank it up more and delay the checkpoints as much as possible during
these updates. 64 segments is already 1024M. If you're changing a
lot more data than that in a single update / insert then cranking them
up more might help.

What you might need to do is to change your completion target to
something closer to 100% since it's likely that most of the updates /
inserts are not happening to the same rows over and over, but to
different rows for each one, the closer you can get to 100% completed
before the next checkpoint the better. This will cause some more IO
to happen, but will even it out more (hopefully) so that you don't get
checkpoint spikes. Look into the checkpoint logging options so you
can monitor how they're affecting system performance.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tore Halvorsen 2010-01-08 11:03:55 FusionIO performance
Previous Message Carlo Stonebanks 2010-01-08 06:14:58 Re: Massive table (500M rows) update nightmare