Re: Is there anything special about pg_dump's compression?

From: Jean-David Beyer <jeandavid8(at)verizon(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Is there anything special about pg_dump's compression?
Date: 2007-11-16 12:54:11
Message-ID: 473D92F3.3020707@verizon.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Tom Lane wrote:
> Jean-David Beyer <jeandavid8(at)verizon(dot)net> writes:
>> I turned the software compression off. It took:
>> 524487428 bytes (524 MB) copied, 125.394 seconds, 4.2 MB/s
>
>> When I let the software compression run, it uses only 30 MBytes. So whatever
>> compression it uses is very good on this kind of data.
>> 29810260 bytes (30 MB) copied, 123.145 seconds, 242 kB/s
>
> Seems to me the conclusion is obvious: you are writing about the same
> number of bits to physical tape either way.

I guess so. I _am_ impressed by how much compression is achieved.

> The physical tape speed is
> surely the real bottleneck here, and the fact that the total elapsed
> time is about the same both ways proves that about the same number of
> bits went onto tape both ways.

I do not get that. If the physical tape speed is the bottleneck, why is it
only about 242 kB/s in the software-compressed case, and 4.2 MB/s in the
hardware-uncompressed case? The tape drive usually gives over 6 MB/s rates
when running a BRU (similar to find > cpio) when doing a backup of the rest
of my system (where not all the files compress very much)? Also, when doing
a BRU backup, the amount of cpu time is well under 100%. If I am right, the
postgres server is running 100% of the CPU and the client (pg_dump) is the
one that actually compresses (if it is enabled in software) is either 40% or
12%.
>
> The quoted MB and MB/s numbers are not too comparable because they are
> before and after compression respectively.
>
> The software compression seems to be a percent or two better than the
> hardware's compression, but that's not enough to worry about really.

Agreed. The times for backup (and restore) are acceptable. Being new to
postgres, I am just interested in how it works from a user's point-of-view.

> What you should ask yourself is whether you have other uses for the main
> CPU's cycles during the time you're taking backups. If so, offload the
> compression cycles onto the tape hardware. If not, you might as well
> gain the one or two percent win.

Sure, I always have something to do with the excess cycles, though it is not
an obsession of mine.

But from intellectual curiousity, why is the postgres _server_ taking 100%
of a cpu when doing a backup when it is the postgres _client_ that is
actually running the tape drive -- especially if it is tape IO limited?

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 07:40:01 up 24 days, 58 min, 0 users, load average: 4.30, 4.29, 4.21

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Shane Ambler 2007-11-16 16:40:58 Re: Is there anything special about pg_dump's compression?
Previous Message Bart Degryse 2007-11-16 07:56:40 Re: trap for any exception