Re: [PATCH] COPY .. COMPRESSED

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] COPY .. COMPRESSED
Date: 2013-01-15 22:46:57
Message-ID: 4912.1358290017@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> I find the argument that this supports compression-over-the-wire to be
>> quite weak, because COPY is only one form of bulk data transfer, and
>> one that a lot of applications don't ever use. If we think we need to
>> support transmission compression for ourselves, it ought to be
>> integrated at the wire protocol level, not in COPY.

> As far as I can tell, COPY is the option which is strongly recommended
> for bulk-data-operations. I can see the use-case for wanting SELECT
> results to be compressed, but it strikes me as the 10% case for PG users
> rather than the 90% one. Ditto for COPY vs. large INSERT .. VALUES.

Really? Given that libpq provides no useful support for doing anything
with COPY data, much less higher-level packages such as Perl DBI, I'd
venture that the real-world ratio is more like 90/10. If not 99/1.
There might be a few souls out there who are hardy enough and concerned
enough with performance to have made their apps speak COPY protocol,
and not given up on it the first time they hit a quoting/escaping bug
... but not many, I bet.

> Compressing every small packet seems like it'd be overkill and might
> surprise people by actually reducing performance in the case of lots of
> small requests.

Yeah, proper selection and integration of a compression method would be
critical, which is one reason that I'm not suggesting a plugin for this.
You couldn't expect any-random-compressor to work well. I think zlib
would be okay though when making use of its stream compression features.
The key thing there is to force a stream buffer flush (too lazy to look
up exactly what zlib calls it, but they have the concept) exactly when
we're about to do a flush to the socket. That way we get cross-packet
compression but don't have a problem with the compressor failing to send
the last partial message when we need it to.

(My suggestion of an expansible option is for future-proofing, not
because I think we'd try to support more than one option today.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-01-15 22:53:29 Re: Parallel query execution
Previous Message Andrew Dunstan 2013-01-15 22:45:46 Re: json api WIP patch