Re: When/if to Reindex

From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: When/if to Reindex
Date: 2007-07-18 18:52:51
Message-ID: 20070718185251.GD39272@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jul 18, 2007 at 01:08:30PM -0400, Steven Flatt wrote:
> We're using Postgres 8.2.4.
>
> I'm trying to decide whether it's worthwhile to implement a process that
> does periodic reindexing. In a few ad hoc tests, where I've tried to set up
> data similar to how our application does it, I've noticed decent performance
> increases after doing a reindex as well as the planner being more likely to
> choose an index scan.
>
> Some background: we make extensive use of partitioned tables. In fact, I'm
> really only considering reindexing partitions that have "just closed". In
> our simplest/most general case, we have a table partitioned by a timestamp
> column, each partition 24 hours wide. The partition will have an index on
> the timestamp column as well as a few other indexes including a primary key
> index (all b-tree). Is there a programmatic way I can decide, upon the
> "closing" of a partition, which, if any, of these indexes will benefit from
> a reindex? Can I determine things like average node density, node depth, or
> any other indication as to the quality of an index? Will pg_class.relpages
> be any help here?

Looking at that stuff will help determine if the index is bloated, or if
it's just bigger than optimal. Once you're done writing to an index, it
might be worth reindexing with a fillfactor of 100% to shrink things
down a bit.

> Is it a simple matter of running some queries, reindexing the table, then
> running the queries again to determine overall performance change? If so,
> what queries would exercise this best?
>
> Just trying to determine if the extra cost of reindexing newly closed
> partitions will be worth the performance benefit of querying the data.
> Reindexing a table with a day's worth of data is taking on the order of a
> few hours (10s of millions of rows).
>
> The docs say that:
>
> "...for B-tree indexes a freshly-constructed index is somewhat faster to
> access than one that has been updated many times, because logically adjacent
> pages are usually also physically adjacent in a newly built index... It
> might be worthwhile to reindex periodically just to improve access speed."

That's the other consideration, though if you're seeing a big difference
I suspect it's an issue of indexes fitting in cache or not.
--
Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Finneid 2007-07-18 19:08:08 Re: insert vs select into performance
Previous Message Jim C. Nasby 2007-07-18 18:50:07 Re: [PERFORM] Parrallel query execution for UNION ALL Queries