Re: database 1.2G, pg_dump 73M?!

From: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
To: Ross Boylan <RossBoylan(at)stanfordalumni(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: database 1.2G, pg_dump 73M?!
Date: 2008-03-30 18:27:22
Message-ID: 47EFDB8A.6070606@familiedobbelsteen.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ross Boylan wrote:
> I have a postgres server for which du reports
> 1188072 /var/lib/postgresql/8.2/main
> on Linux system.
> The server has only one real database, which is for bacula. When I dump
> the database, it's 73Mg.
>
> This is immediately after I did a full vacuum and restarted the server.
>
> Also,
> bacula=> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC
> limit 15;
> relname | relpages
> ---------------------------------+----------
> file_jpfid_idx | 27122
> file_pathid_idx | 17969
> file_jobid_idx | 17948
> file_pkey | 14580
> file_fp_idx | 12714
> file | 11558
> file_filenameid_idx | 9806
> filename | 3958
> filename_name_idx | 2510
> filename_pkey | 1367
> path | 966
> path_name_idx | 950
> path_pkey | 151
> pg_attribute_relid_attnam_index | 46
> pg_proc | 45
>
> It seems very strange to me that there is such a difference in size
> between the dump and the database: the data store is almost 15 time
> larger than the dump.
>
> Is this to be expected (e.g., from the indices taking up disk space)?
> Is there anything I can do to reclaim some disk space
There are a few factors you need to take into account:

* Data storage in the database is packed into blocks and contains
header data. Since data needs to be put into blocks there is a
potential for waisting space. If you are unlucky it can become
nearly a single row in the worst case.
* You need to vacuum often, to ensure obsolete rows are removed and
space can be reused.
* Tables are not reduced in size and only grown. I thinks cluster
and vacuum full will reduce the size of your table.
* Indexes are not in the backup, they are derived from the table
data on a restore.
If you remove the indexes you are left with 150~200 MB of data (I
guessed).
Doing reindex will rebuild the index and get rid of all the bloat
it has been collected during use. _I recommend you try this_, as
your indexes on the file table look quite huge.

But the most important factor for you will be the following:

* Backups are compressed. Since you store filenames and paths these
will have a very high amount of regularity and therefore are very
good targets for compression. This can save a huge amount of data.
If you take a compression factor of 50%~70% you will reach your 70 MB.

Ow, server restarts will not help reduce your database size. In fact,
nothing at all should change, except lower performance until sufficient
cached data is back in the cache again.

Hope this helps...

- Joris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message postgre 2008-03-30 18:28:56 Re: [GENERAL] Re: [GENERAL] postgreSQL multithreading
Previous Message Scott Marlowe 2008-03-30 18:04:47 Re: Using tables in other PostGreSQL database