All right, I performed a VACUUM FULL last night and after about 3 hours I
tried running a select count(*) FROM tblwordidx and that did help things
considerably (it runs in ~20 seconds instead of 1-2 minutes). Not as good
as originally, but close.
But, here's the breakdown of the db:
I'm using the database as a keyword based file search engine (not the most
efficient method, I know, but it works well for my purposes). The biggest
and most relevant tables are a table of files and of words. The basic
operation that each file has a set of keywords associated with it, I do a
whole word search on tblwordidx and join with tblfiles (I know, the naming
scheme sucks, sorry!).
Three times a day I scan the network and update the database. I insert
about 180,000 rows into a temporary table and then use it to update
temporary table (tbltmp). With the aid of a few other tables, I clean up
tblFiles so that existing rows have an updated timestamp in tblseen and
files with a timestamp older than 1 day are removed. Then, I take the new
rows in tblfiles and use a perl script to add more words to tblwordidx.
After each update a do a VACUUM and VACUUM ANALYZE which usually grinds
for 10 to 15 minutes.
I'm running this db on a celeron 450Mhz with 256MB RAM and a 60GB HDD
(7200 rpm). For the most part I have the db running "well enough." Over
time however, I find that performance degrades, the count(*) above is an
example of a command that does worse over time. It gets run once an hour
for stats collection. When I first migrated the db to v7.3.1 it would
take about 5-10 seconds (which it is close to now after a VACUUM FULL) but
after a few weeks it would take over a minute of really intense HDD
activity. Also of note is that when I first loaded the data it would
cache very well with the query taking maybe taking 15 seconds if I had
just started the db after reboot, but when it was in its "slow" state
repeating the query didn't noticably use the disk less (nor did it take
I've attached a VACUUM VERBOSE and my conf file (which is pretty vanilla,
I haven't tweaked it since updating). If you have any suggestions on how
I can correct this situation through config changes that would be ideal
and thanks for your help, if is just a case of doing lots of VACUUM FULLs,
I can definitely see it as a performance bottleneck for postgres.
Fortunately I can afford the huge peroformance penalty of a VACUUM FULL,
but I can certainly think of apps that can't.
\d tblfiles: (219,248 rows)
Column | Type | Modifiers
fid | integer | not null default
hid | integer | not null
pid | integer | not null
name | character varying(256) | not null
size | bigint | not null
Indexes: temp_fid_key unique btree (fid),
filediridx btree (hid, pid, name, size, fid),
fileidx btree (name, hid, pid, fid),
fileidxfid btree (fid, name, pid)
\d tblwordidx: (1,739,481 rows)
Column | Type | Modifiers
fid | integer | not null
word | character varying(128) | not null
todel | boolean |
Indexes: wordidxfid btree (fid, word),
wordidxfidonly btree (fid),
wordidxw btree (word, fid)
On Mon, 17 Mar 2003, Tom Lane wrote:
> "Neil Conway" <neilc(at)samurai(dot)com> writes:
> > Sean Chittenden said:
> > A "select count(*) FROM tblwordidx" initially took about 1 second to
> > return a count of 2 million but after a few months it took several
> > minutes of really hard HDD grinding.
> >> That's because there are dead entries in the index that weren't being
> >> reused or cleaned up. As I said, this has been fixed.
> > That's doubtful: "select count(*) FROM foo" won't use an index.
> To know what's going on, as opposed to guessing about it, we'd need to
> know something about the physical sizes of the table and its indexes.
> "vacuum verbose" output would be instructive...
> But my best theorizing-in-advance-of-the-data guess is that Logan's
> FSM settings are too small, causing free space to be leaked over time.
> If a vacuum full restores the original performance then that's probably
> the right answer.
> regards, tom lane
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
Description: text/plain (2.0 KB)
In response to
pgsql-performance by date
|Next:||From: Christopher Kings-Lynne||Date: 2003-03-18 02:44:01|
|Subject: Re: postgresql meltdown on PlanetMath.org |
|Previous:||From: Aaron Krowne||Date: 2003-03-18 02:31:22|
|Subject: Re: postgresql meltdown on PlanetMath.org|