I've after some opinions about insert performance.
I'm importing a file with 13,002 lines to a database that ends up with
75,703 records across 6 tables. This is a partial file – the real data
is 4 files with total lines 95174. I'll be loading these files each
morning, and then running a number of queries on them.
The select queries run fast enough, (mostly - 2 queries are slow but
I'll look into that later), but importing is slower than I'd like it
to be, but I'm wondering what to expect?
I've done some manual benchmarking running my script 'time script.pl'
I realise my script uses some of the time, bench marking shows that
%50 of the time is spent in dbd:execute.
Test 1, For each import, I'm dropping all indexes and pkeys/fkeys,
then importing, then adding keys and indexes. Then I've got successive
runs. I figure the reindexing will get more expensive as the database
Successive Imports: 44,49,50,57,55,61,72 (seconds)
= average 1051inserts/second (which now that I've written this seems
Test 2, no dropping etc of indexes, just INSERTs
Import – 61, 62, 73, 68, 78, 74 (seconds)
= average 1091 inserts/second
Machine is Linux 2.6.4, 1GB RAM, 3.something GHz XEON processor, SCSI
hdd's (raid1). PostgreSQL 7.4.2. Lightly loaded machine, not doing
much other than my script. Script and DB on same machine.
Sysctl –a | grep shm
kernel.shmmni = 4096
kernel.shmall = 134217728 (pages or bytes? Anyway…)
kernel.shmmax = 134217728
tcpip_socket = true
max_connections = 32
superuser_reserved_connections = 2
shared_buffers = 8192
sort_mem = 4096
vacuum_mem = 16384
max_fsm_relations = 300
fsync = true
wal_buffers = 64
checkpoint_segments = 10
effective_cache_size = 16000
syslog = 1
silent_mode = false
log_connections = true
log_pid = true
log_timestamp = true
stats_start_collector = true
stats_row_level = true
Can I expect it to go faster than this? I'll see where I can make my
script itself go faster, but I don't think I'll be able to do much.
I'll do some pre-prepare type stuff, but I don't expect significant
gains, maybe 5-10%. I'd could happily turn off fsync for this job, but
not for some other databases the server is hosting.
Any comments/suggestions would be appreciated.
pgsql-performance by date
|Next:||From: Matthew T. O'Connor||Date: 2004-10-20 02:12:10|
|Subject: Re: Insert performance, what should I expect?|
|Previous:||From: Simon Riggs||Date: 2004-10-20 00:33:16|
|Subject: Re: how much mem to give postgres?|