I experienced this morning a performance problem when we imported a
dump in a 8.1 database.
The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG: duration: 1636301.317 ms statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG: duration: 20613.009 ms statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG: duration: 10653.290 ms statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG: duration: 15031.579 ms statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);
The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.
So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?
Thanks for any input.
pgsql-performance by date
|Next:||From: Tom Lane||Date: 2008-05-22 13:14:36|
|Subject: Re: Index creation time and distribution |
|Previous:||From: Robins Tharakan||Date: 2008-05-22 00:54:36|
|Subject: Re: Varchar pkey instead of integer|