Re: database 1.2G, pg_dump 73M?!

From: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>
To: "Ross Boylan" <RossBoylan(at)stanfordalumni(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: database 1.2G, pg_dump 73M?!
Date: 2008-03-30 21:59:36
Message-ID: E4953B65D9E5054AA6C227B410C56AA91B5F36@exchange1.joris2k.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>-----Original Message-----
>From: Ross Boylan [mailto:RossBoylan(at)stanfordalumni(dot)org]
>Sent: Sunday, 30 March 2008 23:43
>To: Joris Dobbelsteen
>Cc: RossBoylan(at)stanfordalumni(dot)org; pgsql-general(at)postgresql(dot)org
>Subject: Re: [GENERAL] database 1.2G, pg_dump 73M?!
>
>On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
>> From the top contenders, about half are indexes, so you are stuck
>> with ~200 MB of data in the tables.
>> Postgresql has some wasted space due to placement of the tuples in a
>> block and overhead for each block and row. I don't know
>those values,
>> but they are in the range of 24 bytes per tuple, I believe.
>Secondly a
>> block is 8 KB by default and tuples cannot be stored into multiple
>> blocks (thats what toast to work around).
>>
>> All in all: Lookup tuple sizes, if they are small than the overhead
>> from postgresql can be a big factor. If you are huge you loose on
>> portions of unoccupied space in blocks. I believe pg_statistics will
>> provide this information.
>There is a pg_statistic (no "s") table, but I don't know how
>to get tuple size from it--the documentation refers to the
>source code to figure out the codes. Backing up a step, I
>don't know what a tuple is in Postgres, and don't see an entry
>for it in the index.

It was pg_stats.
You get avg_width. It gives this per column.

So probably you want
SELECT tablename, SUM(avg_width)
FROM pg_stats
WHERE schemaname = 'public'
GROUP BY tablename;

>Is a tuple just a row? That's what the docs say for the following
>report:

Yes.
Also where I typed "block" it will refer to "page".

># select distinct relname, reltuples, relpages from pg_class
>where relkind='r' and substring(relname, 1, 3) != 'pg_';
>
> relname | reltuples | relpages
>-------------------------+-----------+----------
> basefiles | 0 | 0
> cdimages | 0 | 0
> client | 2 | 1
> counters | 1 | 1
> device | 0 | 0
> file | 650659 | 11558
> filename | 623012 | 3958
> fileset | 22 | 1
> job | 384 | 10
> jobmedia | 596 | 7
> location | 0 | 0
> locationlog | 0 | 0
> log | 0 | 0
> media | 245 | 9
> mediatype | 2 | 1
> path | 67908 | 966
> pool | 5 | 1
> sql_features | 439 | 6
> sql_implementation_info | 12 | 1
> sql_languages | 4 | 1
> sql_packages | 10 | 1
> sql_parts | 9 | 1
> sql_sizing | 23 | 1
> sql_sizing_profiles | 0 | 0
> status | 19 | 1
> storage | 2 | 1
> unsavedfiles | 0 | 0
> version | 1 | 1

Do relpages * 8096 / reltuples.
Default installs have 8 KB pages/blocks by default.

For file it should be ~144 bytes/tuple.
For filename it should be ~51 bytes/tuple.

Probably you will get some signficant differences here.

>> Another factor is representation in the SQL dump might be more
>> efficient than in the database, but this highly depends on your data
>> set. For example, a int8 takes 8 bytes in a table, while it takes
>> between 1 and ~20 in a SQL dump.
>>
>> How the plain SQL dump becomes this small I cannot explain without
>> much much more details.
>
>On Tom's point, bacula regularly inserts entries into the
>tables and then, days to months later, deletes them. As far
>as I know, the VACUUM FULLs I just did were the first ever; I
>did do several of them because I kept getting messages about
>needing more fsm_pages.
>
>I am still trying to figure out if the database was getting
>any automatic vacuuming at all. The Postgres documentation
>(the database is 8.2, though I'm moving to 8.3 soon) sounds as
>if it's on automatically, but the Debian-specific
>documentation suggests I may need to do some additional things
>to enable it.
>
>Probably the fsm_pages being low also hurt disk useage, since
>the message accompanying the vacuum said that's what happens
>if fsm_pages is low. It was 20k; vacuum said I needed 56k,
>and I upped it to 80k.
>I'm not sure if my recent cleaning has brought the needed
>fsm_pages down.
>
>I've only been doing partial backups for the last few months,
>so there's probably less info in the tables than under normal
>conditions. I suppose it's possible the space I gained was
>just a temporary win.

Looks like some configuration changes are needed to tune your
installation to better suite you needs. fsm_pages can be modified, but I
don't know if any other parameters might need change for that.
Documentation can help and probably a lot of people here can do and tell
it blindfolded.

- Joris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ross Boylan 2008-03-30 22:23:22 Re: database 1.2G, pg_dump 73M?!
Previous Message Ivan Sergio Borgonovo 2008-03-30 21:56:01 shortcut for select * where id in (arrayvar)