Skip site navigation (1) Skip section navigation (2)

index creation order?

From: Allen Landsidel <all(at)biosys(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: index creation order?
Date: 2003-10-31 16:02:24
Message-ID: 6.0.0.22.0.20031031104214.024853b0@pop.hotpop.com (view raw or flat)
Thread:
Lists: pgsql-performance
Yet another question.. thanks to everyone responding to all these so far.. ;)

This one is basically.. given I have a big table already in COPY format, 
about 28 million rows, all keys guaranteed to be unique, I'm trying to find 
out which of the following will get the import finished the fastest:

a) CREATE TABLE with no indexes or keys.  Run the COPY (fast, ~30min), then 
CREATE INDEX on each column it's needed on, and ALTER TABLE for the pk and 
each fk needed.

b) Same as above, but instead of ALTER TABLE -- ditch the FK, and CREATE 
UNIQUE INDEX on the PK.

c) CREATE TABLE with the PK/FK's in the table structure, CREATE INDEX on 
needed columns, then run the COPY.

d) .. is to c as b is to a .. Don't create PK/FK's, just CREATE UNIQUE 
INDEX after table creation, then run the COPY.

My gut instinct tells me that in order, fastest to slowest, it's going to 
be d,b,c,a; this is what I've experienced on other DBs such as MSSQL and 
Oracle.

If there isn't a significant difference between all of them, performance 
wise, I think something is dreadfully wrong here.  Running "a", the ALTER 
TABLE to add the PK ran for 17 hours and still wasn't finished.

The table without indexes or keys is:
CREATE TABLE foo (
id BIGINT NOT NULL DEFAULT nextval('foo_id_sequence'),
master_id BIGINT NOT NULL,
other_id INTEGER NOT NULL,
status INTEGER NOT NULL,
addtime TIMESTAMP WITH TIME ZONE DEFAULT now()
);

Details on machine and configuration are:

The machine is the same one I've mentioned before.. SMP AthlonMP 2800+ 
(2.1GHz), 4x18GB 15krpm SCSI RAID-0 with 256MB onboard cache on a 
quad-channel ICP-Vortex controller, 2GB physical memory.  Running FreeBSD 
RELENG_4, relevant filesystems with softupdates enabled and mounted noatime.

kernel options are:
maxusers        0

options         MAXDSIZ="(1536UL*1024*1024)" # maximum limit
options         MAXSSIZ="(512UL*1024*1024)"  # maximum stack
options         DFLDSIZ="(512UL*1024*1024)"  # default limit
options         VM_BCACHE_SIZE_MAX="(384UL*1024*1024)" # cache size upped 
from default 200MB
options         SYSVSHM                 #SYSV-style shared memory
options         SYSVMSG                 #SYSV-style message queues
options         SYSVSEM                 #SYSV-style semaphores
options         SHMMAXPGS=262144
options         SHMALL=262144
options         SHMSEG=256
options         SEMMNI=384
options         SEMMNS=768
options         SEMMNU=384
options         SEMMAP=384

postgresql.conf settings are:

shared_buffers = 30000
max_fsm_relations = 10000
max_fsm_pages = 2000000
max_locks_per_transaction = 64
wal_buffers = 128
sort_mem = 1310720 (1.2GB)
vacuum_mem = 262144 (256MB)
checkpoint_segments = 64
checkpoint_timeout = 1200
commit_delay = 20000
commit_siblings = 2
fsync=true
random_page_cost = 1.7
cpu_tuple_cost = 0.005
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0012

stats_start_collector = true
stats_command_string = true
stats_row_level = true
stats_block_level = true


Responses

pgsql-performance by date

Next:From: Rod TaylorDate: 2003-10-31 16:23:30
Subject: Re: index creation order?
Previous:From: Bill MoranDate: 2003-10-31 15:55:33
Subject: Re: Pg+Linux swap use

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group