Re: Naming-scheme for db-files

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Naming-scheme for db-files
Date: 2002-08-29 11:24:14
Message-ID: Pine.GSO.4.44.0208291401340.8881-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

To continue my previous message.

I just load test data (i've used in README.tsearch) and looked at sizes:

-rw------- 1 postgres users 23191552 Aug 29 14:08 53016937
-rw------- 1 postgres users 81059840 Aug 29 14:08 52639027

Table titles occupies 80Mb, index on txtidx column occupies 22Mb.
After doing

test=# select title into titles_tmp from titles;
SELECT

I got size of table 'titles' without txtidx field

-rw------- 1 postgres users 30105600 Aug 29 14:14 53016938

So, txtidx column itself occupies about 50Mb. I don't see any giga-sizes !

after running 'vacuum full analyze' I got:

-rw------- 1 postgres users 30105600 Aug 29 14:26 53016938
-rw------- 1 postgres users 36880384 Aug 29 14:26 53016937
-rw------- 1 postgres users 51494912 Aug 29 14:26 52639027
pg(at)zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 53016937
53016937 = t_idx
pg(at)zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 52639027
53016934 = pg_toast_52639027
53016936 = pg_toast_52639027_idx
52639027 = titles
pg(at)zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 53016938
53016938 = titles_tmp

So, actual size of 'txtidx' field is 20 Mb !

---------------------------------------------------------------------------

test=# select count(*) from titles;
s count
--------
377905
(1 row)

test=# select version();
version
---------------------------------------------------------------
PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3
(1 row)

test=# \d titles
Table "titles"
Column | Type | Modifiers
----------+------------------------+-----------
title | character varying(256) |
titleidx | txtidx |
Indexes: t_idx

There is one performance tips: Use CLUSTER command to speedup search query !
There was thread about recently in -hackers. In 7.2 CLUSTER works but
requires some addtional operations, in 7.3 it should works fine without
problems.

Regards,

Oleg

On Thu, 29 Aug 2002, Oleg Bartunov wrote:

> I didn't follow this thread, but I'm a little bit surprised !
> txtdix itself is just a parsed data from corresponding column !
> and txtidx stores only unique lexems, so size of txtidx column should
> be lesser than original one ! I'd be interested to see more details
> (if you could send me your data and sql script). Also,
> did you use oid2name from contrib to see sizes of tables and indices ?
> What is an output of 'select version()' ?
> 100,000 rows is not a big deal.
>
> regards,
>
> Oleg
>
>
> On Thu, 29 Aug 2002, Markus Wollny wrote:
>
> > Just a try on using txtidx for a speedy search over the e-mail-fields.
> > No success, though - fti would have been better for the task (like
> > finding all users who have got an aol.com-adress).
> >
> > > -----UrsprNngliche Nachricht-----
> > > Von: Martijn van Oosterhout [mailto:kleptog(at)svana(dot)org]
> > > Gesendet: Donnerstag, 29. August 2002 01:18
> > > An: Markus Wollny
> > > Cc: pgsql-general(at)postgresql(dot)org
> > > Betreff: Re: [GENERAL] Naming-scheme for db-files
> > >
> > >
> > > On Wed, Aug 28, 2002 at 11:25:44PM +0200, Markus Wollny wrote:
> > > > After dropping the mentioned column emailindex of type txtidx (via
> > > > copying the data to a temporary table and renaming this table
> > > > afterwards), disk usage of that table evaporated from
> > > 1862MB to a humble
> > > > 61MB. And emailindex was completely empty... Very strange,
> > > but ~60MB is
> > > > very much more more like what I had estimated the size of
> > > that table to
> > > > be.
> > >
> > > What on earth is:
> > >
> > > emailindex txtidx,
> > >
> > > Anyway?
> > >
> > > > -----UrsprУМngliche Nachricht-----
> > > > Von: Markus Wollny
> > > > Gesendet: Mi 28.08.2002 17:04
> > > > An: Martijn van Oosterhout
> > > > Cc: pgsql-general(at)postgresql(dot)org
> > > > Betreff: Re: [GENERAL] Naming-scheme for db-files
> > > >
> > > >
> > > >
> > > > Hi!
> > > >
> > > > Yes, I run vacuum every night - and log-output indicates no
> > > > errors at
> > > > all. Yet I am indeed quite puzzled about the size of this table.
> > > > Is
> > > > there some way of finding out which column consumes so much
> > > > space?
> > > >
> > > > select count(*) from ct_com_user return 95858 rows.
> > > > The CREATE-statement for this table is as follows:
> > > >
> > > > CREATE TABLE ct_com_user (
> > > > user_id numeric(10, 0),
> > > > login varchar(1000),
> > > > password varchar(1000),
> > > > status numeric(10, 0),
> > > > rights varchar(20) DEFAULT 'r',
> > > > firstname varchar(1000),
> > > > firstname_visible numeric(1, 0) DEFAULT 0,
> > > > lastname varchar(1000),
> > > > lastname_visible numeric(1, 0) DEFAULT 0,
> > > > clan varchar(1000),
> > > > clan_visible numeric(1, 0) DEFAULT 0,
> > > > street varchar(1000),
> > > > street_visible numeric(1, 0) DEFAULT 0,
> > > > zipcode varchar(1000),
> > > > zipcode_visible numeric(1, 0) DEFAULT 0,
> > > > city varchar(1000),
> > > > city_visible numeric(1, 0) DEFAULT 0,
> > > > country varchar(1000),
> > > > country_visible numeric(1, 0) DEFAULT 0,
> > > > phone1 varchar(1000),
> > > > phone1_visible numeric(1, 0) DEFAULT 0,
> > > > phone2 varchar(1000),
> > > > phone2_visible numeric(1, 0) DEFAULT 0,
> > > > mobile varchar(1000),
> > > > mobile_visible numeric(1, 0) DEFAULT 0,
> > > > fax varchar(1000),
> > > > fax_visible numeric(1, 0) DEFAULT 0,
> > > > email1 varchar(1000),
> > > > email1_visible numeric(1, 0) DEFAULT 0,
> > > > email2 varchar(1000),
> > > > email2_visible numeric(1, 0) DEFAULT 0,
> > > > icq varchar(1000),
> > > > icq_visible numeric(1, 0) DEFAULT 0,
> > > > homepage varchar(1000),
> > > > homepage_visible numeric(1, 0) DEFAULT 0,
> > > > description varchar(1000),
> > > > description_visible numeric(1, 0) DEFAULT 0,
> > > > hobbies varchar(1000),
> > > > hobbies_visible numeric(1, 0) DEFAULT 0,
> > > > signature1 varchar(4000),
> > > > signature2 varchar(4000),
> > > > signature3 varchar(4000),
> > > > forum_view varchar(20) DEFAULT 'flat',
> > > > temp_password varchar(100),
> > > > registered timestamptz,
> > > > last_login timestamptz,
> > > > created timestamptz DEFAULT ('now'::text)::timestamp(6) with
> > > > time
> > > > zone,
> > > > lines numeric(3, 0) DEFAULT 400,
> > > > lines_forum numeric(3, 0) DEFAULT 20,
> > > > forum_lines numeric(3, 0) DEFAULT 20,
> > > > forum_smileys varchar(50) DEFAULT 'enhanced',
> > > > site_id numeric(10, 0) DEFAULT 0,
> > > > flag_id numeric(10, 0) DEFAULT 0,
> > > > forum_quoting varchar(50) DEFAULT 'enhanced',
> > > > forum_flatpostings numeric(3, 0) DEFAULT 10,
> > > > forum_images numeric(1, 0) DEFAULT 2,
> > > > user_image numeric(1, 0) DEFAULT 0,
> > > > user_image_visible numeric(1, 0) DEFAULT 0,
> > > > chat_password varchar(50),
> > > > chat_last_visit timestamptz,
> > > > authorpoints_visible numeric(1, 0) DEFAULT 0,
> > > > emailindex txtidx,
> > > > CONSTRAINT idx_pk_ct_com_user UNIQUE (user_id),
> > > > CONSTRAINT idx_u_ct_com_user_login UNIQUE (login)
> > > > ) WITH OIDS;
> > > >
> > > > As hardly anyone actually uses a signature, hobby or
> > > > description, I
> > > > cannot imagine how this should amount to almost 2 GB of data
> > > > (1862MB) -
> > > > this would mean roughly 20k of data per row average - and no way
> > > > have we
> > > > got this amount of data - the absolute maximum data per row as
> > > > defined
> > > > would be about 30k, I guess, but that would mean that nearly all
> > > > of our
> > > > users would use nearly all available space and I know that this
> > > > is quite
> > > > far from true. I suspect it may have something to do with
> > > > emailindex of
> > > > the txtidx-type... As it's not absolutely necessary, I'll try
> > > > and drop
> > > > this column and see what I get...
> > > >
> > > > Regards,
> > > >
> > > > Markus
> > > >
> > > > > -----UrsprУМngliche Nachricht-----
> > > > > Von: Martijn van Oosterhout [mailto:kleptog(at)svana(dot)org]
> > > > > Gesendet: Mittwoch, 28. August 2002 16:30
> > > > > An: Markus Wollny
> > > > > Cc: pgsql-general(at)postgresql(dot)org
> > > > > Betreff: Re: [GENERAL] Naming-scheme for db-files
> > > > >
> > > > >
> > > > > On Wed, Aug 28, 2002 at 03:54:45PM +0200, Markus Wollny wrote:
> > > > > > Hi!
> > > > > >
> > > > > > As I was just checking disk-usage of a database (PostgreSQL
> > > > > 7.2.1), I
> > > > > > stumbled over some files named with a trailing .1 added to
> > > > the usual
> > > > > > oid. Now if a table 'example' with oid 12345 exists, what
> > > > > does the file
> > > > > > 12345.1 contain exactly? I didn't find anything about
> > > > > .1-files in the
> > > > > > documentation...
> > > > >
> > > > > Postgres splits files at 1GB. The .1 file would be the second
> > > > > part of the
> > > > > file. When that also reaches 1GB, you'll get a .2 file.
> > > > >
> > > > > You're doing VACUUM [FULL] regularly, right?
> > > > >
> > > > > --
> > > > > Martijn van Oosterhout <kleptog(at)svana(dot)org>
> > > > > http://svana.org/kleptog/
> > > > > > There are 10 kinds of people in the world, those that can do
> > > > binary
> > > > > > arithmetic and those that can't.
> > > > >
> > > >
> > > > ---------------------------(end of
> > > > broadcast)---------------------------
> > > > TIP 2: you can get off all lists at once with the unregister
> > > > command
> > > > (send "unregister YourEmailAddressHere" to
> > > > majordomo(at)postgresql(dot)org)
> > > >
> > > >
> > > >
> > > > ---------------------------(end of
> > > broadcast)---------------------------
> > > > TIP 1: subscribe and unsubscribe commands go to
> > > majordomo(at)postgresql(dot)org
> > >
> > > --
> > > Martijn van Oosterhout <kleptog(at)svana(dot)org>
> > > http://svana.org/kleptog/
> > > > There are 10 kinds of people in the world, those that can do binary
> > > > arithmetic and those that can't.
> > >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jean-Christian Imbeault 2002-08-29 11:25:11 Data Type to represent a year/month value
Previous Message Tourtounis Sotiris 2002-08-29 11:03:35 Tuple, table, database object identifiers size