Re: database 1.2G, pg_dump 73M?!

From: Ross Boylan <RossBoylan(at)stanfordalumni(dot)org>
To: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
Cc: RossBoylan(at)stanfordalumni(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: database 1.2G, pg_dump 73M?!
Date: 2008-03-30 18:54:38
Message-ID: 1206903278.8239.212.camel@corn.betterworld.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Sun, 2008-03-30 at 20:27 +0200, Joris Dobbelsteen wrote:
> Ross Boylan wrote:
> > I have a postgres server for which du reports
> > 1188072 /var/lib/postgresql/8.2/main
> > on Linux system.
> > The server has only one real database, which is for bacula. When I dump
> > the database, it's 73Mg.
> >
> > This is immediately after I did a full vacuum and restarted the server.
> >
> > Also,
> > bacula=> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC
> > limit 15;
> > relname | relpages
> > ---------------------------------+----------
> > file_jpfid_idx | 27122
> > file_pathid_idx | 17969
> > file_jobid_idx | 17948
> > file_pkey | 14580
> > file_fp_idx | 12714
> > file | 11558
> > file_filenameid_idx | 9806
> > filename | 3958
> > filename_name_idx | 2510
> > filename_pkey | 1367
> > path | 966
> > path_name_idx | 950
> > path_pkey | 151
> > pg_attribute_relid_attnam_index | 46
> > pg_proc | 45
> >
> > It seems very strange to me that there is such a difference in size
> > between the dump and the database: the data store is almost 15 time
> > larger than the dump.
> >
> > Is this to be expected (e.g., from the indices taking up disk space)?
> > Is there anything I can do to reclaim some disk space
> There are a few factors you need to take into account:
>
> * Data storage in the database is packed into blocks and contains
> header data. Since data needs to be put into blocks there is a
> potential for waisting space. If you are unlucky it can become
> nearly a single row in the worst case.
> * You need to vacuum often, to ensure obsolete rows are removed and
> space can be reused.
> * Tables are not reduced in size and only grown. I thinks cluster
> and vacuum full will reduce the size of your table.
> * Indexes are not in the backup, they are derived from the table
> data on a restore.
> If you remove the indexes you are left with 150~200 MB of data (I
> guessed).
> Doing reindex will rebuild the index and get rid of all the bloat
> it has been collected during use. _I recommend you try this_, as
> your indexes on the file table look quite huge.
reindexing had a huge effect. After reindex the top tables (file,
filename and path) I now see
SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 15;
relname | relpages
---------------------------------+----------
file | 11558
filename | 3958
filename_name_idx | 2383
file_jpfid_idx | 2145
file_fp_idx | 1787
file_jobid_idx | 1427
file_pathid_idx | 1427
file_pkey | 1427
file_filenameid_idx | 1427
filename_pkey | 1367
path | 966
path_name_idx | 871
path_pkey | 151
pg_attribute_relid_attnam_index | 46
pg_proc | 45
and du now reports 451M. That still seems a bit large, given the size
of the sql dump, but it's almost 2/3 lower than it was before.

Thanks so much!

I guess I need to figure out how to reindex automatically.
>
> But the most important factor for you will be the following:
>
> * Backups are compressed. Since you store filenames and paths these
> will have a very high amount of regularity and therefore are very
> good targets for compression. This can save a huge amount of data.
> If you take a compression factor of 50%~70% you will reach your 70 MB.
I don't see how this is relevant, since my dump file was plain text
(sql).

> Ow, server restarts will not help reduce your database size. In fact,
> nothing at all should change, except lower performance until sufficient
> cached data is back in the cache again.
>
> Hope this helps...

It was a huge help.
>
> - Joris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2008-03-30 19:21:58 Re: Locale / Encoding mismatch
Previous Message postgre 2008-03-30 18:47:55 Re: [GENERAL] Re: [GENERAL] postgreSQL multithreading