Skip site navigation (1) Skip section navigation (2)

Strange Create Index behaviour

From: Gary Doades <gpd(at)gpdnet(dot)co(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Strange Create Index behaviour
Date: 2006-02-15 20:00:39
Message-ID: 43F38867.6010701@gpdnet.co.uk (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Platform: FreeBSD 6.0, Postgresql 8.1.2 compiled from the ports collection.

Not sure if this belongs in performance or bugs..

A pg_restore of my 2.5GB database was taking up to 2 hours to complete 
instead of the expected 10-15 minutes. Checking the server it was mostly 
CPU bound. Testing further I discovered that is was spending huge 
amounts of CPU time creating some indexes.

It took a while to find out, but basically it boils down to this:

If the column that is having the index created has a certain 
distribution of values then create index takes a very long time. If the 
data values (integer in this case) a fairly evenly distributed then 
create index is very quick, if the data values are all the same it is 
very quick. I discovered that in the slow cases the column had 
approximately half the values as zero and the rest fairly spread out. 
One column started off with around 400,000 zeros and the rest of the 
following rows spread between values of 1 to 500,000.

I have put together a test case that demonstrates the problem (see 
below). I create a simple table, as close in structure to one of my 
problem tables and populate an integer column with 100,000 zeros follow 
by 100,000 random integers between 0 and 100,000. Then create an index 
on this column. I then drop the table and repeat. The create index 
should take around 1-2 seconds. A fair proportion of the time it takes 
50 seconds!!!

If I fill the same row with all random data the create index always 
takes a second or two. If I fill the column with all zeros everything is 
still OK.

When my tables that I am trying to restore are over 2 million rows the 
creating one index can take an hour!! (almost all CPU time).

All other areas of performance, once the dump is restored and analysed 
seem to be OK, even large hash/merge joins and sorts

This is entirely repeatable in FreeBSD in that around half the time 
create index will be incredibly slow.

All postgresql.conf settings are at the defaults for the test initially 
(fresh install)

The final interesting thing is that as I increase shared buffers to 2000 
or 3000 the problem gets *worse*

The following text is output from the test script..

select version();
                                             version 

------------------------------------------------------------------------------------------------
  PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 
3.4.4 [FreeBSD] 20050518
(1 row)

\timing
Timing is on.

-----  Many slow cases, note the 50+ seconds cases

create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 81.859 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1482.141 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1543.508 ms
create index idx on atest(r);
CREATE INDEX
Time: 56685.230 ms

drop table atest;
DROP TABLE
Time: 4.616 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 6.889 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2009.787 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1828.663 ms
create index idx on atest(r);
CREATE INDEX
Time: 3991.257 ms

drop table atest;
DROP TABLE
Time: 3.796 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 19.965 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1625.059 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2622.827 ms
create index idx on atest(r);
CREATE INDEX
Time: 1082.799 ms

drop table atest;
DROP TABLE
Time: 4.627 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.953 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2068.744 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2671.420 ms
create index idx on atest(r);
CREATE INDEX
Time: 8047.660 ms

drop table atest;
DROP TABLE
Time: 3.675 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.582 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1723.987 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2263.131 ms
create index idx on atest(r);
CREATE INDEX
Time: 50050.308 ms

drop table atest;
DROP TABLE
Time: 52.744 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 25.370 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2052.733 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2631.317 ms
create index idx on atest(r);
CREATE INDEX
Time: 61440.897 ms

drop table atest;
DROP TABLE
Time: 26.137 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 24.794 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2851.977 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1553.046 ms
create index idx on atest(r);
CREATE INDEX
Time: 1774.920 ms


----  Fast (Normal?) cases

drop table atest;
DROP TABLE
Time: 4.422 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.543 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1516.246 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1407.400 ms
create index idx on atest(r);
CREATE INDEX
Time: 903.503 ms

drop table atest;
DROP TABLE
Time: 3.820 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 22.861 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1455.556 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2037.996 ms
create index idx on atest(r);
CREATE INDEX
Time: 718.286 ms

drop table atest;
DROP TABLE
Time: 4.503 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.448 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1523.540 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1261.473 ms
create index idx on atest(r);
CREATE INDEX
Time: 727.707 ms

drop table atest;
DROP TABLE
Time: 3.564 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.897 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1447.504 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1403.525 ms
create index idx on atest(r);
CREATE INDEX
Time: 754.577 ms

drop table atest;
DROP TABLE
Time: 4.633 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.196 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1618.544 ms
insert into atest (i,r,d1,d2) select 
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1530.450 ms
create index idx on atest(r);
CREATE INDEX
Time: 802.980 ms
drop table atest;
DROP TABLE
Time: 4.707 ms
mserver#

Regards,
Gary.

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2006-02-15 20:56:08
Subject: Re: Strange Create Index behaviour
Previous:From: RonDate: 2006-02-15 19:53:28
Subject: Re: Reliability recommendations

pgsql-hackers by date

Next:From: Tom LaneDate: 2006-02-15 20:56:08
Subject: Re: Strange Create Index behaviour
Previous:From: Chris BrowneDate: 2006-02-15 17:17:47
Subject: Re: Patch Submission Guidelines

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group