Re: Is there anything special about pg_dump's compression?

From: Jean-David Beyer <jeandavid8(at)verizon(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Is there anything special about pg_dump's compression?
Date: 2007-11-16 05:00:56
Message-ID: 473D2408.8060406@verizon.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Andrew Sullivan wrote:
> On Thu, Nov 15, 2007 at 11:05:44AM -0500, Jean-David Beyer wrote:
>> Does pg_dump's compression do anything really special that it is not
>> likely the tape drive already does? The drive claims 2:1 compression
>> for average data (e.g., not already compressed stuff like .jpeg files).
>>
>
> It's zlib, if I recall correctly. So probably not.
>
I turned the software compression off. It took:

524487428 bytes (524 MB) copied, 125.394 seconds, 4.2 MB/s

When I let the software compression run, it uses only 30 MBytes. So whatever
compression it uses is very good on this kind of data.

29810260 bytes (30 MB) copied, 123.145 seconds, 242 kB/s

Since the whole database like that was probably in RAM, I would not expect
much IO time. Also the data transfer light was on a lot of the time instead
of short blinks. It did not seem to lighten the CPU load much. The postgres
server process got 100% of a cpu and the client took about 12% of another
when running uncompressed. I imagined the client did the compression and
writing to tape, and the server just picked up the data from the
shared_buffers (= 253000 @ 8KB each); i.e., that holds about 2 GBytes. When
the client is compressing, the client's cpu takes about 40% of a processor.
When it is not compressing, it takes about 12% of a processor.

If I am right, it seems to take a lot of time to pick up the database from
RAM if it requires 100% of a 3.06GHz Xeon processor. The tape drive (Exabyte
VXA-2) has a 12 MB/sec transfer rate, so it should be the limiting factor
(but it does not seem to be), but I do not notice a whole lot of IO-Wait
time (though there is some).

Any idea why the server is compute-limited just reading from the shared
buffers and delivering it to the client to write to tape? Is it that I have
too many shared buffers and I should reduce it from about 2 GBytes? Does it
sequentially search the shared buffers or something? I made it large so I
could get at least all the active indices in, and preferably the hot data
pages as well.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 23:15:01 up 23 days, 16:33, 2 users, load average: 5.25, 5.32, 5.34

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2007-11-16 05:29:31 Re: Is there anything special about pg_dump's compression?
Previous Message Daniel Caune 2007-11-15 19:06:28 Erlang & PostgreSQL native driver