Re: Space for pg_dump

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>, General postgres mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Space for pg_dump
Date: 2009-04-18 07:28:15
Message-ID: alpine.GSO.2.01.0904180322030.1205@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 31 Mar 2009, Scott Marlowe wrote:

> Sadly, there is no exact maths for such things. If your database has
> tons of indexes and such, it might be 20 or 100 times bigger on disk
> than it will be during backup. If it's all compressible text with few
> indexes, it might be a 1:1 or so size.

Since running an entire pgdump can take forever on a big database, what I
usually do here is start by running the disk usage query at
http://wiki.postgresql.org/wiki/Disk_Usage

That lets you better see index vs. table usage. Then, for the bigger
tables, I do something like this:

psql -c "COPY (select * from bigtable limit 100000) to stdout" | gzip > bigtable.gz
gzip -l bigtable.gz

That lets you get sample a decent sized chunk of the table to figure out
what compression ratio you're likely to get on the data in there. Given
all the table sizes and a compression ratio estimate, from there you can
make a fairly accurate guess of what the whole dump is going to take up,
presuming your data is fairly evenly distributed such that the first
records that come back are typical of the whole thing.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rainer Bauer 2009-04-18 09:40:55 Re: Space for pg_dump
Previous Message Dylan Adams 2009-04-18 01:06:55 Re: High memory usage with savepoints & encoding differences