Skip site navigation (1) Skip section navigation (2)

Index creation time and distribution

From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Index creation time and distribution
Date: 2008-05-22 12:32:27
Message-ID: 1d4e0c10805220532v514c2bd5je4c6c20ceb485495@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG:  duration: 20613.009 ms  statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG:  duration: 10653.290 ms  statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG:  duration: 15031.579 ms  statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

-- 
Guillaume

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2008-05-22 13:14:36
Subject: Re: Index creation time and distribution
Previous:From: Robins TharakanDate: 2008-05-22 00:54:36
Subject: Re: Varchar pkey instead of integer

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group