Re: PostgreSQL 8.4 performance tuning questions

From: Scott Carey <scott(at)richrelevance(dot)com>
To: PFC <lists(at)peufeu(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PostgreSQL 8.4 performance tuning questions
Date: 2009-08-03 17:00:26
Message-ID: C69C65BA.E081%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 7/31/09 4:01 PM, "PFC" <lists(at)peufeu(dot)com> wrote:

> On Fri, 31 Jul 2009 19:04:52 +0200, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Greg Stark <gsstark(at)mit(dot)edu> writes:
>>> On Thu, Jul 30, 2009 at 11:30 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> I did some tracing and verified that pg_dump passes data to deflate()
>>>> one table row at a time.  I'm not sure about the performance
>>>> implications of that, but it does seem like it might be something to
>>>> look into.
>>
>>> I suspect if this was a problem the zlib people would have added
>>> internal buffering ages ago. I find it hard to believe we're not the
>>> first application to use it this way.
>>
>> I dug into this a bit more. zlib *does* have internal buffering --- it
>> has to, because it needs a minimum lookahead of several hundred bytes
>> to ensure that compression works properly. The per-call overhead of
>> deflate() looks a bit higher than one could wish when submitting short
>> chunks, but oprofile shows that "pg_dump -Fc" breaks down about like
>> this:
>
> During dump (size of dump is 2.6 GB),
>
> No Compression :
> - postgres at 70-100% CPU and pg_dump at something like 10-20%
> - dual core is useful (a bit...)
> - dump size 2.6G
> - dump time 2m25.288s
>
> Compression Level 1 :
> - postgres at 70-100% CPU and pg_dump at 20%-100%
> - dual core is definitely useful
> - dump size 544MB
> - dump time 2m33.337s
>
> Since this box is mostly idle right now, eating CPU for compression is no
> problem...
>

I get very different (contradictory) behavior. Server with fast RAID, 32GB
RAM, 2 x 4 core 3.16Ghz Xeon 54xx CPUs. CentOS 5.2
8.3.6
No disk wait time during any test. One test beforehand was used to prime
the disk cache.
100% CPU in the below means one core fully used. 800% means the system is
fully loaded.

pg_dump > file (on a subset of the DB with lots of tables with small
tuples)
6m 27s, 4.9GB; 12.9MB/sec
50% CPU in postgres, 50% CPU in pg_dump

pg_dump -Fc > file.gz
9m6s, output is 768M (6.53x compression); 9.18MB/sec
30% CPU in postgres, 70% CPU in pg_dump

pg_dump | gzip > file.2.gz
6m22s, 13MB/sec.
50% CPU in postgres, 50% Cpu in pg_dump, 50% cpu in gzip

The default (5) compression level was used.

So, when using pg_dump alone, I could not get significantly more than one
core of CPU (all on the same box). No matter how I tried, pg_dump plus the
postgres process dumping data always totaled about 102% -- it would
flulctuate in top, give or take 15% at times, but the two always were very
close (within 3%) of this total.

Piping the whole thing to gzip gets some speedup. This indicates that
perhaps the implementation or use of gzip is inappropriate on pg_dump's side
or the library version is older or slower. Alternatively, the use of gzip
inside pg_dump fails to pipeline CPU useage as well as piping it does, as
the above shows 50% more CPU utilization when piping.

I can do the same test with a single table that is 10GB later (which does
dump much faster than 13MB/sec and has rows that average about 500 bytes in
size). But overall I have found pg_dump's performace sorely lacking, and
this is a data risk in the big picture. Postgres is very good about not
losing data, but that only goes up to the limits of the hardware and OS,
which is not good enough. Because of long disaster recovery times and poor
replication/contingency features, it is a fairly unsafe place for data once
it gets beyond a certain size and a BC plan requires minimal downtime.

> Adding an option to use LZO instead of gzip could be useful...
>
> Compressing the uncompressed 2.6GB dump :
>
> - gzip -1 :
>
> - compressed size : 565 MB
> - compression throughput : 28.5 MB/s
> - decompression throughput : 74 MB/s
>
> - LZO -1 :
> - compressed size : 696M
> - compression throughput : 86 MB/s
> - decompression throughput : 247 MB/s
>
> Conclusion : LZO could help for fast disks (RAID) or slow disks on a
> CPU-starved server...
>

LZO would be a great option, it is very fast, especially decompression.
With gzip, one rarely gains by going below gzip -3 or above gzip -6.

> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Subbiah Stalin-XCGF84 2009-08-03 17:09:40 Query help
Previous Message Marc Cousin 2009-08-03 11:46:26 Re: Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version