Skip site navigation (1) Skip section navigation (2)

Re: postgresql meltdown on PlanetMath.org

From: Logan Bowers <logan(at)datacurrent(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, sean(at)chittenden(dot)org,pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql meltdown on PlanetMath.org
Date: 2003-03-18 02:41:07
Message-ID: Pine.LNX.4.53.0303171845190.8636@neo.magick.org (view raw or flat)
Thread:
Lists: pgsql-performance
All right, I performed a VACUUM FULL last night and after about 3 hours I
tried running a select count(*) FROM tblwordidx and that did help things
considerably (it runs in ~20 seconds instead of 1-2 minutes).  Not as good 
as originally, but close.  

But, here's the breakdown of the db:  

I'm using the database as a keyword based file search engine (not the most 
efficient method, I know, but it works well for my purposes).  The biggest 
and most relevant tables are a table of files and of words.  The basic 
operation that each file has a set of keywords associated with it, I do a 
whole word search on tblwordidx and join with tblfiles (I know, the naming 
scheme sucks, sorry!).  

Three times a day I scan the network and update the database.  I insert
about 180,000 rows into a temporary table and then use it to update
temporary table (tbltmp).  With the aid of a few other tables, I clean up
tblFiles so that existing rows have an updated timestamp in tblseen and
files with a timestamp older than 1 day are removed.  Then, I take the new
rows in tblfiles and use a perl script to add more words to tblwordidx.  
After each update a do a VACUUM and VACUUM ANALYZE which usually grinds
for 10 to 15 minutes.

I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD
(7200 rpm).  For the most part I have the db running "well enough."  Over
time however, I find that performance degrades, the count(*) above is an
example of a command that does worse over time.  It gets run once an hour
for stats collection.  When I first migrated the db to v7.3.1 it would
take about 5-10 seconds (which it is close to now after a VACUUM FULL) but 
after a few weeks it would take over a minute of really intense HDD
activity.  Also of note is that when I first loaded the data it would
cache very well with the query taking maybe taking 15 seconds if I had 
just started the db after reboot, but when it was in its "slow" state 
repeating the query didn't noticably use the disk less (nor did it take 
less time).  

I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla,
I haven't tweaked it since updating).  If you have any suggestions on how
I can correct this situation through config changes that would be ideal
and thanks for your help, if is just a case of doing lots of VACUUM FULLs,
I can definitely see it as a performance bottleneck for postgres.  
Fortunately I can afford the huge peroformance penalty of a VACUUM FULL,
but I can certainly think of apps that can't.


Logan Bowers 

\d tblfiles: (219,248 rows)
  Column  |            Type             |                 Modifiers
----------+-----------------------------+-------------------------------------------
 fid      | integer                     | not null default 
nextval('fileids'::text)
 hid      | integer                     | not null
 pid      | integer                     | not null
 name     | character varying(256)      | not null
 size     | bigint                      | not null
Indexes: temp_fid_key unique btree (fid),
         filediridx btree (hid, pid, name, size, fid),
         fileidx btree (name, hid, pid, fid),
         fileidxfid btree (fid, name, pid)

\d tblwordidx: (1,739,481 rows)
          Table "public.tblwordidx"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 fid    | integer                | not null
 word   | character varying(128) | not null
 todel  | boolean                |
Indexes: wordidxfid btree (fid, word),
         wordidxfidonly btree (fid),
         wordidxw btree (word, fid)



On Mon, 17 Mar 2003, Tom Lane wrote:

> "Neil Conway" <neilc(at)samurai(dot)com> writes:
> > Sean Chittenden said:
> > A "select count(*) FROM tblwordidx" initially took about 1 second to
> > return a count of 2 million but after a few months it took several
> > minutes of really hard HDD grinding.
> >> 
> >> That's because there are dead entries in the index that weren't being
> >> reused or cleaned up.  As I said, this has been fixed.
> 
> > That's doubtful: "select count(*) FROM foo" won't use an index.
> 
> To know what's going on, as opposed to guessing about it, we'd need to
> know something about the physical sizes of the table and its indexes.
> "vacuum verbose" output would be instructive...
> 
> But my best theorizing-in-advance-of-the-data guess is that Logan's
> FSM settings are too small, causing free space to be leaked over time.
> If a vacuum full restores the original performance then that's probably
> the right answer.
> 
> 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

Attachment: VACVERB
Description: text/plain (2.0 KB)
Attachment: postgresql.conf
Description: text/plain (5.3 KB)

In response to

Responses

pgsql-performance by date

Next:From: Christopher Kings-LynneDate: 2003-03-18 02:44:01
Subject: Re: postgresql meltdown on PlanetMath.org
Previous:From: Aaron KrowneDate: 2003-03-18 02:31:22
Subject: Re: postgresql meltdown on PlanetMath.org

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group