Re: COPY insert performance

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Chris Isaacson" <cisaacson(at)tradebotsystems(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: COPY insert performance
Date: 2005-07-26 00:31:02
Message-ID: BF0AD256.9F22%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chris,

You can try the Bizgres distribution of postgres (based on version 8.0.3),
the COPY support is 30% faster as reported by OSDL (without indexes). This
is due to very slow parsing within the COPY command, which is sped up using
micro-optimized logic for parsing. There is a patch pending for the
development version of Postgres which implements the same code, but you can
use Bizgres and get it now instead of waiting for postgres 8.1 to come out.
Also, Bizgres is QA tested with the enhanced features.

Bizgres is a free / open source distribution of Postgres for Business
Intelligence / Data Warehousing.

Bizgres currently features postgres 8.0.3 plus these patches:
* Bypass WAL when performing ³CREATE TABLE AS SELECT²
* COPY is between 30% and 90% faster on machines with fast I/O
* Enhanced support for data partitioning with partition elimination
optimization
* Bitmap Scan support for multiple index use in queries and better low
cardinality column performance
* Improved optimization of queries with LIMIT

See: http://www.bizgres.org for more.

- Luke

On 7/25/05 3:32 PM, "Chris Isaacson" <cisaacson(at)tradebotsystems(dot)com> wrote:

> I need COPY via libpqxx to insert millions of rows into two tables. One table
> has roughly have as many rows and requires half the storage. In production,
> the largest table will grow by ~30M rows/day. To test the COPY performance I
> split my transactions into 10,000 rows. I insert roughly 5000 rows into table
> A for every 10,000 rows into table B.
>
> Table A has one unique index:
>
> "order_main_pk" UNIQUE, btree (cl_ord_id)
>
> Table B has 1 unique index and 2 non-unique indexes:
>
> "order_transition_pk" UNIQUE, btree (collating_seq)
> "order_transition_ak2" btree (orig_cl_ord_id)
> "order_transition_ak3" btree (exec_id)
>
> My testing environment is as follows:
> -Postgresql 8.0.1
> -libpqxx 2.5.0
> -Linux 2.6.11.4-21.7-smp x86_64
> -Dual Opteron 246
> -System disk (postgres data resides on this SCSI disk) - Seagate (ST373453LC)
> - 15K, 73 GB
> (http://www.seagate.com/cda/products/discsales/marketing/detail/0,1081,549,00.
> html)
> -2nd logical disk - 10K, 36GB IBM SCSI (IC35L036UCDY10-0) - WAL reside on this
> disk
> -NO RAID
>
> PostgreSQL
> Here are the results of copying in 10M rows as fast as possible:
> (10K/transaction)
> Total Time: 1129.556 s
> Rows/sec: 9899.922
> Transaction>1.2s 225
> Transaction>1.5s 77
> Transaction>2.0s 4
> Max Transaction 2.325s
>
> MySQL
> I ran a similar test with MySQL 4.1.10a (InnoDB) which produced these results:
> (I used MySQL's INSERT INTO x VALUES (1,2,3)(4,5,6)(...,...,...) syntax)
> (10K/transaction)
> Total Time: 860.000 s
> Rows/sec: 11627.91
> Transaction>1.2s 0
> Transaction>1.5s 0
> Transaction>2.0s 0
> Max Transaction 1.175s
>
> Considering the configurations shown below, can anyone offer advice to close
> the 15% gap and the much worse variability I'm experiencing. Thanks
>
> My postgresql.conf has the following non-default values:
> # -----------------------------
> # PostgreSQL configuration file
> # -----------------------------
> listen_addresses = '*' # what IP interface(s) to listen on;
> max_connections = 100
> #---------------------------------------------------------------------------
> # RESOURCE USAGE (except WAL)
> #---------------------------------------------------------------------------
> shared_buffers = 65536 # min 16, at least max_connections*2, 8KB each
> work_mem = 2048 # min 64, size in KB
> maintenance_work_mem = 204800 # min 1024, size in KB
> max_fsm_pages = 2250000 # min max_fsm_relations*16, 6 bytes each
> bgwriter_delay = 200 # 10-10000 milliseconds between rounds
> bgwriter_percent = 10 # 0-100% of dirty buffers in each round
> bgwriter_maxpages = 1000 # 0-1000 buffers max per round
> #---------------------------------------------------------------------------
> # WRITE AHEAD LOG
> #---------------------------------------------------------------------------
> fsync = false # turns forced synchronization on or off
> wal_buffers = 64 # min 4, 8KB each
> checkpoint_segments = 40 # in logfile segments, min 1, 16MB each
> checkpoint_timeout = 600 # range 30-3600, in seconds
> #---------------------------------------------------------------------------
> # QUERY TUNING
> #---------------------------------------------------------------------------
> effective_cache_size = 65536 # typically 8KB each
> random_page_cost = 2 # units are one sequential page fetch cost
> #---------------------------------------------------------------------------
> # ERROR REPORTING AND LOGGING
> #---------------------------------------------------------------------------
> log_min_duration_statement = 250 # -1 is disabled, in milliseconds.
> log_connections = true
> log_disconnections = true
> log_duration = true
> log_line_prefix = '<%r%u%p%t%d%%' # e.g. '<%u%%%d> '
> # %u=user name %d=database name
> # %r=remote host and port
> # %p=PID %t=timestamp %i=command tag
> # %c=session id %l=session line number
> # %s=session start timestamp %x=transaction id
> # %q=stop here in non-session processes
> # %%='%'
> log_statement = 'none' # none, mod, ddl, all
> #---------------------------------------------------------------------------
> # RUNTIME STATISTICS
> #---------------------------------------------------------------------------
> # - Query/Index Statistics Collector -
> stats_start_collector = true
> stats_command_string = true
> stats_block_level = true
> stats_row_level = true
> stats_reset_on_server_start = true
>
> My MySQL my.ini has the following non default values:
> innodb_data_home_dir = /var/lib/mysql/
> innodb_data_file_path = ibdata1:10M:autoextend
> innodb_log_group_home_dir = /var/lib/mysql/
> innodb_log_arch_dir = /var/lib/mysql/
> # You can set .._buffer_pool_size up to 50 - 80 %
> # of RAM but beware of setting memory usage too high
> innodb_buffer_pool_size = 512M
> innodb_additional_mem_pool_size = 64M
> # Set .._log_file_size to 25 % of buffer pool size
> innodb_log_file_size = 128M
> innodb_log_buffer_size = 64M
> innodb_flush_log_at_trx_commit = 1
> innodb_lock_wait_timeout = 50
> innodb_flush_method = O_DSYNC
> max_allowed_packet = 16M
>
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2005-07-26 00:51:48 Re: "Vacuum Full Analyze" taking so long
Previous Message John A Meinel 2005-07-25 23:31:08 Re: "Vacuum Full Analyze" taking so long