Re: Issues with \copy from file

From: Sigurgeir Gunnarsson <sgunnars(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Issues with \copy from file
Date: 2009-12-18 12:46:37
Message-ID: ebd3ad520912180446m4a7eeaa3kf859e706f29205d6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I hope the issue is still open though I haven't replied to it before.

Euler mentioned that I did not provide any details about my system. I'm
using version 8.3 and with most settings default on an old machine with 2 GB
of mem. The table definition is simple, four columns; id, value, x, y where
id is primary key and x, y are combined into an index.

I'm not sure if it matters but unlike Euler's suggestion I'm using \copy
instead of COPY. Regarding my comparison to MySQL, it is completely valid.
This is done on the same computer, using the same disk on the same platform.
>From that I would derive that IO is not my problem, unless postgresql is
doing IO twice while MySQL only once.

I guess my tables are InnoDB since that is the default type (or so I think).
BEGIN/COMMIT I did not find change much. Are there any other suggestions ?

My postgres.conf:
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -
shared_buffers = 16MB # min 128kB or max_connections*16kB
temp_buffers = 16MB # min 800kB
#max_prepared_transactions = 5 # can be 0 or more
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 128MB # min 64kB
maintenance_work_mem = 128MB # min 1MB
#max_stack_depth = 2MB # min 100kB

# - Free Space Map -
max_fsm_pages = 2097152 # min max_fsm_relations*16, 6 bytes
each
max_fsm_relations = 500 # min 100, ~70 bytes each

# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
#shared_preload_libraries = '' # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0 # 0-1000 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -
#fsync = on # turns forced synchronization on or
off
#synchronous_commit = on # immediate fsync at commit
#wal_sync_method = fsync # the default is the first option
#full_page_writes = on # recover from partial page writes
#wal_buffers = 64kB # min 32kB
#wal_writer_delay = 200ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

# - Checkpoints -
checkpoint_segments = 64 # in logfile segments, min 1, 16MB
each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 -
1.0
#checkpoint_warning = 30s # 0 is off

# - Archiving -
#archive_mode = off # allows archiving to be done
#archive_command = '' # command to use to archive a logfile
segment
#archive_timeout = 0 # force a logfile segment switch after this

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

autovacuum = on # Enable autovacuum subprocess?
'on'

2009/10/19 Matthew Wakeling <matthew(at)flymine(dot)org>

> On Sun, 18 Oct 2009, Scott Marlowe wrote:
>
>> You can only write data then commit it so fast to one drive, and that
>> speed is usually somewhere in the megabyte per second range. 450+150
>> in 5 minutes is 120 Megs per second, that's pretty fast, but is likely
>> the max speed of a modern super fast 15k rpm drive. If it's taking 20
>> minutes then it's 30 Megs per second which is still really good if
>> you're in the middle of a busy afternoon and the db has other things
>> to do.
>>
>
> You're out by a factor of 60. That's minutes, not seconds.
>
> More relevant is the fact that Postgres will normally log changes in the
> WAL, effectively writing the data twice. As Euler said, the trick is to tell
> Postgres that noone else will need to see the data, so it can skip the WAL
> step:
>
>
> BEGIN;
>> TRUNCATE TABLE foo;
>> COPY foo FROM ...;
>> COMMIT;
>>
>
> I see upward of 100MB/s over here when I do this.
>
> Matthew
>
> --
> Patron: "I am looking for a globe of the earth."
> Librarian: "We have a table-top model over here."
> Patron: "No, that's not good enough. Don't you have a life-size?"
> Librarian: (pause) "Yes, but it's in use right now."
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-12-18 14:18:14 Re: Automatic optimization of IN clauses via INNER JOIN
Previous Message Tom Lane 2009-12-18 06:27:11 Re: seq scan instead of index scan