Skip site navigation (1) Skip section navigation (2)

Re: pg_dump far too slow

From: David Newall <postgresql(at)davidnewall(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com, dcrooke(at)gmail(dot)com
Subject: Re: pg_dump far too slow
Date: 2010-03-21 13:17:40
Message-ID: 4BA61C74.70506@davidnewall.com (view raw or flat)
Thread:
Lists: pgsql-performance
Thanks for all of the suggestions, guys, which gave me some pointers on 
new directions to look, and I learned some interesting things.

The first interesting thing was that piping (uncompressed) pg_dump into 
gzip, instead of using pg_dump's internal compressor, does bring a lot 
of extra parallelism into play.  (Thank you, Matthew Wakeling.)  I 
observed gzip using 100% CPU, as expected, and also two, count them, two 
postgres processes collecting data, each consuming a further 80% CPU.  
It seemed to me that Postgres was starting and stopping these to match 
the capacity of the consumer (i.e. pg_dump and gzip.)  Very nice.  
Unfortunately one of these processes dropped eventually, and, according 
to top, the only non-idle process running was gzip (100%.)  Obviously 
there were postgress and pg_dump processes, too, but they were throttled 
by gzip's rate of output and effectively idle (less than 1% CPU).  That 
is also interesting.  The final output from gzip was being produced at 
the rate of about 0.5MB/second, which seems almost unbelievably slow.

I next tried Tom Lane's suggestion, COPY WITH BINARY, which produced the 
complete 34GB file in 30 minutes (a good result.)  I then compressed 
that with gzip, which took an hour and reduced the file to 32GB (hardly 
worth the effort) for a total run time of 90 minutes.  In that instance, 
gzip produced output at the rate of 10MB/second, so I tried pg_dump -Z0 
to see how quickly that would dump the file.  I had the idea that I'd go 
on to see how quickly gzip would compress it, but unfortunately it 
filled my disk before finishing (87GB at that point), so there's 
something worth knowing: pg_dump's output for binary data is very much 
less compact than COPY WITH BINARY; all those backslashes, as Tom 
pointed out.  For the aforementioned reason, I didn't get to see how 
gzip would perform.  For the record, pg_dump with no compression 
produced output at the rate of 26MB/second; a rather meaningless number 
given the 200%+ expansion of final output.

I am now confident the performance problem is from gzip, not Postgres 
and wonder if I should read up on gzip to find why it would work so 
slowly on a pure text stream, albeit a representation of PDF which 
intrinsically is fairly compressed.  Given the spectacular job that 
postgres did in adjusting it's rate of output to match the consumer 
process, I did wonder if there might have been a tragic interaction 
between postgres and gzip; perhaps postgres limits its rate of output to 
match gzip; and gzip tries to compress what's available, that being only 
a few bytes; and perhaps that might be so inefficient that it hogs the 
CPU; but it don't think that likely.  I had a peek at gzip's source 
(surprisingly readable) and on first blush it does seem that unfortunate 
input could result in only a few bytes being written each time through 
the loop, meaning only a few more bytes could be read in.

Just to complete the report, I created a child table to hold the PDF's, 
which are static, and took a dump of just that table, and adjusted my 
backup command to exclude it.  Total size of compressed back sans PDFs 
circa 7MB taking around 30 seconds.

In response to

Responses

pgsql-performance by date

Next:From: Craig RingerDate: 2010-03-21 13:56:38
Subject: Re: pg_dump far too slow
Previous:From: Andy ColsonDate: 2010-03-21 03:47:30
Subject: Re: mysql to postgresql, performance questions

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group