Skip site navigation (1) Skip section navigation (2)

copy with compression progress n

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: copy with compression progress n
Date: 2006-05-31 09:38:05
Message-ID: 447D63FD.9060609@pse-consulting.de (view raw or flat)
Thread:
Lists: pgsql-hackers
I've been playing around with COPYing large binary data, and implemented 
a COMPRESSION transfer format. The server side compression saves 
significant bandwidth, which may be the major limiting factor when large 
amounts of data is involved (i.e. in many cases where COPY TO/FROM 
STDIN/STDOUT is used)
In addition, a progress notification can be enabled using a PROGRESS 
<each n lines> option.

I tested this with a table, containing 2000 rows with a highly 
compressable bytea column (size 1.4GB, on-disk 138MB). Numbers are as 
follows (8.2 HEAD psql):
pg_dump -a -F c -t		652s, 146MB
\copy TO /dev/null		322s
\copy TO /dev/null binary	24s
\copy TO /dev/null compression	108s
\copy TO /tmp/file binary	55s, 1.4GB
\copy TO /tmp/file compression	108s, 133MB
\copy TO STDOUT binary|gzip -1	69s, 117MB

So using the plain text copy has a large overhead for text data over 
binary formats. OTOH, copying normal rows WITH BINARY may bloat the 
result too. A typical test table gave these numbers:
COPY: 		6014 Bytes
BINARY:		15071 Bytes
COMPRESSION:	2334 Bytes

The compression (pg_lzcompress) is less efficient than a binary copy 
piped to gzip, as long as the data transfer of 1.4GB from server to 
client isn't limited by network bandwidth. Apparently, pg_lzcompress 
uses 53s to compress to 133MB, while gzip only needs 14s for 117MB. 
Might be worth to have a look optimizing that since it's used in 
tuptoaster. Still, when network traffic is involved, it may be better to 
have some time spent on the server to reduce data (e.g. for Slony, which 
uses COPY to start a replication, and is likely to be operated over 
lines <1GBit/s).

The attached patch implements COPY ... WITH [BINARY] COMPRESSION 
(compression implies BINARY). The copy data uses bit 17 of the flag 
field to identify compressed data.
The PROGRESS <n> option to throw notices each n lines has a caveat: when 
copying TO STDOUT, data transfer will cease after the first notice was 
sent. This may either mean "dont ereport(NOTICE) when COPYing data to 
the client" or a bug somewhere.

Regards,
Andreas

Attachment: copy-compression.patch
Description: text/plain (21.0 KB)

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2006-05-31 09:47:26
Subject: Re: [PATCHES] Magic block for modules
Previous:From: Bruce MomjianDate: 2006-05-31 09:27:34
Subject: Re: Compile libpq with vc8

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group