Re: database 1.2G, pg_dump 73M?!

From: Ross Boylan <RossBoylan(at)stanfordalumni(dot)org>
To: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
Cc: RossBoylan(at)stanfordalumni(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: database 1.2G, pg_dump 73M?!
Date: 2008-03-30 21:43:26
Message-ID: 1206913406.8239.227.camel@corn.betterworld.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
> From the top contenders, about half are indexes, so you are stuck
> with
> ~200 MB of data in the tables.
> Postgresql has some wasted space due to placement of the tuples in a
> block and overhead for each block and row. I don't know those values,
> but they are in the range of 24 bytes per tuple, I believe. Secondly
> a
> block is 8 KB by default and tuples cannot be stored into multiple
> blocks (thats what toast to work around).
>
> All in all: Lookup tuple sizes, if they are small than the overhead
> from
> postgresql can be a big factor. If you are huge you loose on portions
> of
> unoccupied space in blocks. I believe pg_statistics will provide this
> information.
There is a pg_statistic (no "s") table, but I don't know how to get
tuple size from it--the documentation refers to the source code to
figure out the codes. Backing up a step, I don't know what a tuple is
in Postgres, and don't see an entry for it in the index.

Is a tuple just a row? That's what the docs say for the following
report:
# select distinct relname, reltuples, relpages from pg_class where
relkind='r' and substring(relname, 1, 3) != 'pg_';

relname | reltuples | relpages
-------------------------+-----------+----------
basefiles | 0 | 0
cdimages | 0 | 0
client | 2 | 1
counters | 1 | 1
device | 0 | 0
file | 650659 | 11558
filename | 623012 | 3958
fileset | 22 | 1
job | 384 | 10
jobmedia | 596 | 7
location | 0 | 0
locationlog | 0 | 0
log | 0 | 0
media | 245 | 9
mediatype | 2 | 1
path | 67908 | 966
pool | 5 | 1
sql_features | 439 | 6
sql_implementation_info | 12 | 1
sql_languages | 4 | 1
sql_packages | 10 | 1
sql_parts | 9 | 1
sql_sizing | 23 | 1
sql_sizing_profiles | 0 | 0
status | 19 | 1
storage | 2 | 1
unsavedfiles | 0 | 0
version | 1 | 1

> Another factor is representation in the SQL dump might be more
> efficient
> than in the database, but this highly depends on your data set. For
> example, a int8 takes 8 bytes in a table, while it takes between 1
> and
> ~20 in a SQL dump.
>
> How the plain SQL dump becomes this small I cannot explain without
> much
> much more details.

On Tom's point, bacula regularly inserts entries into the tables and
then, days to months later, deletes them. As far as I know, the VACUUM
FULLs I just did were the first ever; I did do several of them because I
kept getting messages about needing more fsm_pages.

I am still trying to figure out if the database was getting any
automatic vacuuming at all. The Postgres documentation (the database is
8.2, though I'm moving to 8.3 soon) sounds as if it's on automatically,
but the Debian-specific documentation suggests I may need to do some
additional things to enable it.

Probably the fsm_pages being low also hurt disk useage, since the
message accompanying the vacuum said that's what happens if fsm_pages
is low. It was 20k; vacuum said I needed 56k, and I upped it to 80k.
I'm not sure if my recent cleaning has brought the needed fsm_pages
down.

I've only been doing partial backups for the last few months, so there's
probably less info in the tables than under normal conditions. I
suppose it's possible the space I gained was just a temporary win.

Ross

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2008-03-30 21:56:01 shortcut for select * where id in (arrayvar)
Previous Message Tom Lane 2008-03-30 21:34:21 Re: Connection to PostgreSQL Using Certificate: Wrong Permissions on Private Key File