Re: How often do I need to reindex tables?

From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How often do I need to reindex tables?
Date: 2007-02-28 19:08:58
Message-ID: 20070228140858.492efaf2.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
> > I don't agree. I think that regular indexing is mandatory under some
> > workloads. Example:
> > ...
> > There are some additional indexes that I've snipped from the output that also
> > saw some benefit from reindexing, but let's just focus on file_fp_idx.
>
> Can you describe the usage pattern of that index? I'm curious why it
> doesn't maintain reasonably static size. How often is the underlying
> table vacuumed?

bacula=# \d file
Table "public.file"
Column | Type | Modifiers
------------+---------+-------------------------------------------------------
fileid | integer | not null default nextval('file_fileid_seq'::regclass)
fileindex | integer | not null default 0
jobid | integer | not null
pathid | integer | not null
filenameid | integer | not null
markid | integer | not null default 0
lstat | text | not null
md5 | text | not null
Indexes:
"file_pkey" PRIMARY KEY, btree (fileid)
"file_fp_idx" btree (filenameid, pathid)
"file_jobid_idx" btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up) To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
count
---------
8068956
(1 row)

bacula=# select count(*) from filename;
count
--------
151954
(1 row)

bacula=# select count(*) from path;
count
-------
49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job
runs twice a day. So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day. With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM. I had to bump
max_fsm_pages up to 60000 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat. I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

--
Bill Moran
Collaborative Fusion Inc.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2007-02-28 19:11:06 Re: grant on sequence and pg_restore/pg_dump problem
Previous Message Iannsp 2007-02-28 19:05:28 About PostgreSQL Block Size