Index creation time and distribution

From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Index creation time and distribution
Date: 2008-05-22 12:32:27
Message-ID: 1d4e0c10805220532v514c2bd5je4c6c20ceb485495@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG: duration: 1636301.317 ms statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG: duration: 20613.009 ms statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG: duration: 10653.290 ms statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG: duration: 15031.579 ms statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

--
Guillaume

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2008-05-22 13:14:36 Re: Index creation time and distribution
Previous Message Robins Tharakan 2008-05-22 00:54:36 Re: Varchar pkey instead of integer