Re: pg_dump performance

From: Jared Mauch <jared(at)puck(dot)nether(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Jared Mauch <jared(at)puck(dot)nether(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_dump performance
Date: 2007-12-27 14:58:43
Message-ID: 20071227145843.GA96276@puck.nether.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Dec 27, 2007 at 01:14:25PM +0000, Gregory Stark wrote:
> "Jared Mauch" <jared(at)puck(dot)nether(dot)net> writes:
>
> > pg_dump is utilizing about 13% of the cpu and the
> > corresponding postgres backend is at 100% cpu time.
> > (multi-core, multi-cpu, lotsa ram, super-fast disk).
> >...
> > pg8.3(beta) with the following variances from default
> >
> > checkpoint_segments = 300 # in logfile segments, min 1, 16MB each
> > effective_cache_size = 512MB # typically 8KB each
> > wal_buffers = 128MB # min 4, 8KB each
> > shared_buffers = 128MB # min 16, at least max_connections*2, 8KB each
> > work_mem = 512MB # min 64, size in KB
>
> Fwiw those are pretty unusual numbers. Normally work_mem is much smaller than
> shared_buffers since you only need one block of memory for shared buffers and
> work_mem is for every query (and every sort within those queries). If you have
> ten queries running two sorts each this setting of work_mem could consume 5GB.

I'd still have lots of ram left :)

I'm dealing with normal query results that end up matching 5-10 million
rows based on the index (starttime) not counting the filter afterwards. Each
backend rarely makes it over 256m.

> Raising shared buffers could improve your pg_dump speed. If all the data is in
> cache it would reduce the time spend moving data between filesystem cache and
> postgres shared buffers.

I doubt it's all in cache, but I can look at this. I did not do a
lot of fine tuning of numbers, just enough to get past the defaults and have
an acceptable amount of performance.

> What made you raise wal_buffers so high? I don't think it hurts but that's a
> few orders of magnitude higher than what I would expect to help.

I'm adding chunks of ~1.2m rows every other minute. Once I increase
my data collection pool, this will go up to around [1]2-3m rows or so. I
found having higher wal and checkpoint helped. I didn't spend a lot of time
tweaking these options. Is there some way you know to determine high
watermark numbers for what is being used?

- Jared

[1] - I am concerned that with my 'insert' speed being around 100k/sec
and raw pg_dump speed being around 182k/sec i will start getting data
faster than can be stored and postprocessed.

--
Jared Mauch | pgp key available via finger from jared(at)puck(dot)nether(dot)net
clue++; | http://puck.nether.net/~jared/ My statements are only mine.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Mielke 2007-12-27 15:24:30 Re: With 4 disks should I go for RAID 5 or RAID 10
Previous Message Gregory Stark 2007-12-27 14:51:40 Re: More shared buffers causes lower performances