libpq compression

From: Euler Taveira <euler(at)timbira(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: libpq compression
Date: 2012-06-14 04:33:19
Message-ID: 4FD9698F.2090407@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

There was already some discussion about compressing libpq data [1][2][3].
Recently, I faced a scenario that would become less problematic if we have had
compression support. The scenario is frequent data load (aka COPY) over
slow/unstable links. It should be executed in a few hundreds of PostgreSQL
servers all over Brazil. Someone could argue that I could use ssh tunnel to
solve the problem but (i) it is complex because it involves a different port
in the firewall and (ii) it's an opportunity to improve other scenarios like
reducing bandwidth consumption during replication or normal operation over
slow/unstable links.

AFAICS there aren't objections about implementing compression in libpq. The
problem is what algorithm use for compression. I mean, there is a lot of
patents in this area. As others spotted at [4], we should not implement
algorithms that possibly infringe patents in core. Derivated products are free
to plug whatever algorithms they want. There will be an API to do it.

This work will be sponsored by a company that is interested in this feature.

=== Design ===

- algorithm: zlib, bzip2, (another patent free and bsd licensed?)
- compiled-in option: --with-bzip2
- PGCOMPRESSMODE env
* disable: only try non-compressed connection (default)
* prefer: try compressed connection; if that fails, try a non-compressed
connection
* require: only try compressed connection
- PGCOMPRESSALGO env
* zlib
* bzip2
- compressmode and compressalgo string connection
- compress all data
- compress before send() and decompress after recv()

I am all ears for improving this design. Some of my choices are based on my
research in compression at protocols and PostgreSQL internals.

Keep in mind that I prefer compressing all data instead of a selected set of
messages because (i) every new data message could be coded with compression
support and (ii) avoid that the protocol code turns into a spaghetti.

I'll try to post a patch soon with the ideas discussed at this thread.

[1] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00929.php
[2] http://archives.postgresql.org/pgsql-hackers/2011-01/msg00337.php
[3] http://archives.postgresql.org/pgsql-hackers/2002-03/msg00664.php
[4] http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-14 04:34:21 Re: Ability to listen on two unix sockets
Previous Message Tom Lane 2012-06-14 04:18:07 Re: Ability to listen on two unix sockets