From: | Andreas Pflug <pgadmin(at)pse-consulting(dot)de> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | copy with compression progress n |
Date: | 2006-05-31 09:38:05 |
Message-ID: | 447D63FD.9060609@pse-consulting.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've been playing around with COPYing large binary data, and implemented
a COMPRESSION transfer format. The server side compression saves
significant bandwidth, which may be the major limiting factor when large
amounts of data is involved (i.e. in many cases where COPY TO/FROM
STDIN/STDOUT is used)
In addition, a progress notification can be enabled using a PROGRESS
<each n lines> option.
I tested this with a table, containing 2000 rows with a highly
compressable bytea column (size 1.4GB, on-disk 138MB). Numbers are as
follows (8.2 HEAD psql):
pg_dump -a -F c -t 652s, 146MB
\copy TO /dev/null 322s
\copy TO /dev/null binary 24s
\copy TO /dev/null compression 108s
\copy TO /tmp/file binary 55s, 1.4GB
\copy TO /tmp/file compression 108s, 133MB
\copy TO STDOUT binary|gzip -1 69s, 117MB
So using the plain text copy has a large overhead for text data over
binary formats. OTOH, copying normal rows WITH BINARY may bloat the
result too. A typical test table gave these numbers:
COPY: 6014 Bytes
BINARY: 15071 Bytes
COMPRESSION: 2334 Bytes
The compression (pg_lzcompress) is less efficient than a binary copy
piped to gzip, as long as the data transfer of 1.4GB from server to
client isn't limited by network bandwidth. Apparently, pg_lzcompress
uses 53s to compress to 133MB, while gzip only needs 14s for 117MB.
Might be worth to have a look optimizing that since it's used in
tuptoaster. Still, when network traffic is involved, it may be better to
have some time spent on the server to reduce data (e.g. for Slony, which
uses COPY to start a replication, and is likely to be operated over
lines <1GBit/s).
The attached patch implements COPY ... WITH [BINARY] COMPRESSION
(compression implies BINARY). The copy data uses bit 17 of the flag
field to identify compressed data.
The PROGRESS <n> option to throw notices each n lines has a caveat: when
copying TO STDOUT, data transfer will cease after the first notice was
sent. This may either mean "dont ereport(NOTICE) when COPYing data to
the client" or a bug somewhere.
Regards,
Andreas
Attachment | Content-Type | Size |
---|---|---|
copy-compression.patch | text/plain | 21.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2006-05-31 09:47:26 | Re: [PATCHES] Magic block for modules |
Previous Message | Bruce Momjian | 2006-05-31 09:27:34 | Re: Compile libpq with vc8 |