Skip site navigation (1) Skip section navigation (2)

COPY LOCK for WAL bypass

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: COPY LOCK for WAL bypass
Date: 2005-12-10 12:07:26
Message-ID: 1134216446.26373.9.camel@localhost.localdomain (view raw, whole thread or download thread mbox)
Lists: pgsql-patches
Following patch implements COPY ... FROM ... LOCK as discussed earlier
this year on these threads:

The purpose of the new command is to make an explicit request to run
COPY without producing WAL records (i.e. no logging), so as to improve
the performance of data loads. (This is the first of a number of COPY
performance optimizations, discussed on -hackers).

Default COPY is unchanged. 

LOCK option takes an EXCLUSIVE lock (but perhaps that should be a SHARE
ROW EXCLUSIVE?), allowing it to block out CREATE INDEX builds and

LOCK option will also cause writing of WAL records to be skipped when
XLogArchivingActive() and there are no indexes.

The implementation refactors the code used by CTAS for syncing the data
to disk once command is complete; COPY and CTAS now both use that code.

COPY .. LOCK doesn't write any XLog records as was previously suggested.
My train of thought: After some thought, no other heap-based xlog action
would leave the table in a consistent state after failure. Nobody wants
to see a random single row in the table. I looked into UPDATEing the
last row loaded to generate an xlog rec but it would be difficult to do
that without being horribly kludgy. I looked at adding a new xlog
action, but there is only one slot left for a heap-based xlog action, so
that seemed unwise. I wrote this using an RM_XLOG xlrec, but that
doesn't actually trigger a commit write (I discover). I've simply set a
flag to tell the transaction to record the commit anyway. That was
already there in heapam.c anyway, but just for temp relations; I've
changed the name of the variable to indicate what it does now, across a
number of files.

(It is also arguable that I should implement a WAL record that truncates
the file back down to the starting size, in the event of a failure. I'm
not sure where we were on that; there seem to be comments both in favour
and against that idea. I can see the use for that, so I'll be happy to
add that also, if we agree there is no danger.)

I've added a few lock options onto the copy.sql test script, but not
added (yet) a full suite of testing.

No docs, yet.

Short performance test shown below for 10^6 rows, one col table.
1. Normal COPY 4.5s 11.4s 6.0s 6.1s
2. COPY LOCK 3.0s 2.7s 2.8s 2.7s
with postgresql.conf all default apart from: checkpoint_segments=30

This test was an "all in cache" test. The improvement is substantial,
but the numbers above are best case, IMHO: I would expect only 10-40%
improvement for larger loads in the general case.

Short output shown below, with checkpoint_segments=3, so timings for the
standard non-LOCK COPY probably include checkpoint time also.

postgres=# create table ctest as select
generate_series(1,1000000)::integer as col1;
postgres=# copy ctest to '/usr/local/pgsql/';
postgres=# truncate ctest;
Time: 41.343 ms
postgres=# copy ctest from '/usr/local/pgsql/';
Time: 7111.205 ms
postgres=# truncate ctest;
Time: 23.175 ms
postgres=# copy ctest from '/usr/local/pgsql/' lock;
Time: 2992.482 ms
postgres=# truncate ctest;
Time: 8.306 ms
postgres=# copy ctest from '/usr/local/pgsql/';
Time: 7433.166 ms

Best Regards, Simon Riggs

Attachment: copylock.patch
Description: text/x-patch (20.6 KB)


pgsql-patches by date

Next:From: Volkan YAZICIDate: 2005-12-10 19:36:20
Subject: running script on server shutdown (TODO)
Previous:From: Tom LaneDate: 2005-12-10 01:10:25
Subject: Re: [PATCHES] Patch to allow contrib/pgbench files to have blank lines

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group