Re: why is pg_dump so much smaller than my database?

From: Carson Gross <carsongross(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: why is pg_dump so much smaller than my database?
Date: 2012-03-29 16:41:28
Message-ID: CAO92UoGjUPRR0A+JOsWQvxTZKTPzZPOgxwrJQScQ+Wuos4K3=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Interesting. Is there a perf hit to having a big file on disk? My
understanding is that the primary thing that really matters is keeping your
active set in memory.

This is on Postgres 9.0.x, running on Heroku/ec2.

We do have extremely compressible data so it may be that the dump is
compressed: I'm downloading it now to check.

Thanks for the replies,
Carson

On Thu, Mar 29, 2012 at 12:11 AM, John R Pierce <pierce(at)hogranch(dot)com> wrote:

> On 03/28/12 10:32 PM, Carson Gross wrote:
>
>> I've got a pretty big database (~30 gigs) and when I do a pg_dump, it
>> ends up only being 2 gigs.
>>
>> The database consists mainly of one very large table (w/ a few varchar
>> columns) which, according to pg_relation_size() is 10 gigs and
>> pg_total_relation_size() is 26 gigs (we need to drop some indexes there.)
>>
>> I'm just trying to get my head around the pg_dump being an order of
>> magnitude smaller than the darned database itself. I would thing that the
>> db would offer more efficient encoding for a lot of stuff vs. an ascii file.
>>
>>
> its quite possible your table has a lot of free tuples scattered through
> it as a result of updates or deletes. vacuum makes these available for
> reuse but does NOT free the disk space. ditto, your indexes might be very
> bloated, a reindex may significantly shrink them
>
> if you can afford some application downtime, you may consider running
> CLUSTER on that table, it will copy all the active tuples of the table to
> new file space, and free the old, and also does the reindex operation. I
> would vacuum the table first, after ensuring there aren't any old active
> transactions ('IDLE IN TRANSACTION' status in pg_stat_activity). Note
> that cluster takes an exclusive lock on the table, this is why I said you
> need some application downtime.
>
> you don't say what version you're running, older versions had more
> problems with bloating indexes than newer ones.
>
>
>
>
>
>
>
> --
> john r pierce N 37, W 122
> santa cruz ca mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan Bartlett 2012-03-29 16:43:52 Managing two sets of data in one database
Previous Message Albe Laurenz 2012-03-29 14:32:34 Re: More PG Log