copy with compression progress n

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: copy with compression progress n
Date: 2006-05-31 09:38:05
Message-ID: 447D63FD.9060609@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been playing around with COPYing large binary data, and implemented
a COMPRESSION transfer format. The server side compression saves
significant bandwidth, which may be the major limiting factor when large
amounts of data is involved (i.e. in many cases where COPY TO/FROM
STDIN/STDOUT is used)
In addition, a progress notification can be enabled using a PROGRESS
<each n lines> option.

I tested this with a table, containing 2000 rows with a highly
compressable bytea column (size 1.4GB, on-disk 138MB). Numbers are as
follows (8.2 HEAD psql):
pg_dump -a -F c -t 652s, 146MB
\copy TO /dev/null 322s
\copy TO /dev/null binary 24s
\copy TO /dev/null compression 108s
\copy TO /tmp/file binary 55s, 1.4GB
\copy TO /tmp/file compression 108s, 133MB
\copy TO STDOUT binary|gzip -1 69s, 117MB

So using the plain text copy has a large overhead for text data over
binary formats. OTOH, copying normal rows WITH BINARY may bloat the
result too. A typical test table gave these numbers:
COPY: 6014 Bytes
BINARY: 15071 Bytes
COMPRESSION: 2334 Bytes

The compression (pg_lzcompress) is less efficient than a binary copy
piped to gzip, as long as the data transfer of 1.4GB from server to
client isn't limited by network bandwidth. Apparently, pg_lzcompress
uses 53s to compress to 133MB, while gzip only needs 14s for 117MB.
Might be worth to have a look optimizing that since it's used in
tuptoaster. Still, when network traffic is involved, it may be better to
have some time spent on the server to reduce data (e.g. for Slony, which
uses COPY to start a replication, and is likely to be operated over
lines <1GBit/s).

The attached patch implements COPY ... WITH [BINARY] COMPRESSION
(compression implies BINARY). The copy data uses bit 17 of the flag
field to identify compressed data.
The PROGRESS <n> option to throw notices each n lines has a caveat: when
copying TO STDOUT, data transfer will cease after the first notice was
sent. This may either mean "dont ereport(NOTICE) when COPYing data to
the client" or a bug somewhere.

Regards,
Andreas

Attachment Content-Type Size
copy-compression.patch text/plain 21.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-05-31 09:47:26 Re: [PATCHES] Magic block for modules
Previous Message Bruce Momjian 2006-05-31 09:27:34 Re: Compile libpq with vc8