Skip site navigation (1) Skip section navigation (2)

Re: database 1.2G, pg_dump 73M?!

From: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>
To: "Ross Boylan" <RossBoylan(at)stanfordalumni(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: database 1.2G, pg_dump 73M?!
Date: 2008-03-30 21:59:36
Message-ID: E4953B65D9E5054AA6C227B410C56AA91B5F36@exchange1.joris2k.local (view raw or flat)
Thread:
Lists: pgsql-general
>-----Original Message-----
>From: Ross Boylan [mailto:RossBoylan(at)stanfordalumni(dot)org] 
>Sent: Sunday, 30 March 2008 23:43
>To: Joris Dobbelsteen
>Cc: RossBoylan(at)stanfordalumni(dot)org; pgsql-general(at)postgresql(dot)org
>Subject: Re: [GENERAL] database 1.2G, pg_dump 73M?!
>
>On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
>>  From the top contenders, about half are indexes, so you are stuck 
>> with ~200 MB of data in the tables.
>> Postgresql has some wasted space due to placement of the tuples in a 
>> block and overhead for each block and row. I don't know 
>those values, 
>> but they are in the range of 24 bytes per tuple, I believe. 
>Secondly a 
>> block is 8 KB by default and tuples cannot be stored into multiple 
>> blocks (thats what toast to work around).
>> 
>> All in all: Lookup tuple sizes, if they are small than the overhead 
>> from postgresql can be a big factor. If you are huge you loose on 
>> portions of unoccupied space in blocks. I believe pg_statistics will 
>> provide this information.
>There is a pg_statistic (no "s")  table, but I don't know how 
>to get tuple size from it--the documentation refers to the 
>source code to figure out the codes.  Backing up a step, I 
>don't know what a tuple is in Postgres, and don't see an entry 
>for it in the index.

It was pg_stats.
You get avg_width. It gives this per column.

So probably you want
SELECT tablename, SUM(avg_width)
FROM pg_stats
WHERE schemaname = 'public'
GROUP BY tablename;

>Is a tuple just a row?  That's what the docs say for the following
>report:

Yes.
Also where I typed "block" it will refer to "page".

># select distinct relname, reltuples, relpages from pg_class 
>where relkind='r' and substring(relname, 1, 3) != 'pg_';
>
>         relname         | reltuples | relpages 
>-------------------------+-----------+----------
> basefiles               |         0 |        0
> cdimages                |         0 |        0
> client                  |         2 |        1
> counters                |         1 |        1
> device                  |         0 |        0
> file                    |    650659 |    11558
> filename                |    623012 |     3958
> fileset                 |        22 |        1
> job                     |       384 |       10
> jobmedia                |       596 |        7
> location                |         0 |        0
> locationlog             |         0 |        0
> log                     |         0 |        0
> media                   |       245 |        9
> mediatype               |         2 |        1
> path                    |     67908 |      966
> pool                    |         5 |        1
> sql_features            |       439 |        6
> sql_implementation_info |        12 |        1
> sql_languages           |         4 |        1
> sql_packages            |        10 |        1
> sql_parts               |         9 |        1
> sql_sizing              |        23 |        1
> sql_sizing_profiles     |         0 |        0
> status                  |        19 |        1
> storage                 |         2 |        1
> unsavedfiles            |         0 |        0
> version                 |         1 |        1

Do relpages * 8096 / reltuples.
Default installs have 8 KB pages/blocks by default.

For file it should be ~144 bytes/tuple.
For filename it should be ~51 bytes/tuple.

Probably you will get some signficant differences here.

>> Another factor is representation in the SQL dump might be more 
>> efficient than in the database, but this highly depends on your data 
>> set. For example, a int8 takes 8 bytes in a table, while it takes 
>> between 1 and ~20 in a SQL dump.
>> 
>> How the plain SQL dump becomes this small I cannot explain without 
>> much much more details.
>
>On Tom's point, bacula regularly inserts entries into the 
>tables and then, days to months later, deletes them.  As far 
>as I know, the VACUUM FULLs I just did were the first ever; I 
>did do several of them because I kept getting messages about 
>needing more fsm_pages.
>
>I am still trying to figure out if the database was getting 
>any automatic vacuuming at all.  The Postgres documentation 
>(the database is 8.2, though I'm moving to 8.3 soon) sounds as 
>if it's on automatically, but the Debian-specific 
>documentation suggests I may need to do some additional things 
>to enable it.
>
>Probably the fsm_pages being low also hurt disk useage, since 
>the message accompanying the vacuum said that's what happens 
>if fsm_pages is low.  It was 20k; vacuum said I needed 56k, 
>and I upped it to 80k.
>I'm not sure if my recent cleaning has brought the needed 
>fsm_pages down.
>
>I've only been doing partial backups for the last few months, 
>so there's probably less info in the tables than under normal 
>conditions.  I suppose it's possible the space I gained was 
>just a temporary win.

Looks like some configuration changes are needed to tune your
installation to better suite you needs. fsm_pages can be modified, but I
don't know if any other parameters might need change for that.
Documentation can help and probably a lot of people here can do and tell
it blindfolded.

- Joris

In response to

Responses

pgsql-general by date

Next:From: Ross BoylanDate: 2008-03-30 22:23:22
Subject: Re: database 1.2G, pg_dump 73M?!
Previous:From: Ivan Sergio BorgonovoDate: 2008-03-30 21:56:01
Subject: shortcut for select * where id in (arrayvar)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group