Re: libpq compression

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject: Re: libpq compression
Date: 2020-11-24 18:35:28
Message-ID: CA+TgmoY5AMq8auemc0g3geCpVDeRnawW1A-jUeqT-5cE-_2MiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
<usernamedt(at)yandex-team(dot)ru> wrote:
> To sum up, I think that the current implementation already introduces good benefits. As I proposed in the Usability review, we may introduce the new approaches later as separate compression 'algorithms'.

I don't think the current patch is so close to being committable that
we shouldn't be considering what we really want to have here. It's one
thing to say, well, this patch is basically done, let's not start
redesigning it now. But that's not the case here. For example, I don't
see any committer accepting the comments in zpq_stream.c as adequate,
or the documentation, either. Some comments that have been made
previously, like Andres's remark about the non-standard message
construction in pq_configure(), have not been addressed, and I do not
think any committer is going to agree with the idea that the novel
method chosen by the patch is superior here, not least but not only
because it seems like it's endian-dependent. That function also uses
goto, which anybody thinking of committing this will surely try to get
rid of, and I'm pretty sure the sscanf() isn't good enough to reject
trailing garbage, and the error message that follows is improperly
capitalized. I'm sure there's other stuff, too: this is just based on
a quick look.

Before we start worrying about any of that stuff in too much detail, I
think it makes a lot of sense to step back and consider the design.
Honestly, the work of changing the design might be smaller than the
amount of cleanup the patch needs. But even if it's larger, it's
probably not vastly larger. And in any case, I quite disagree with the
idea that we should commit to a user-visible interface that exposes a
subset of the functionality that we needed and then try to glue the
rest of the functionality on top of it later. If we make a libpq
connection option called compression that controls the type of
compression that is used in both direction, then how exactly would we
extend that later to allow for different compression in the two
directions? Some syntax like compression=zlib/none, where the value
before the slash controls one direction and the value after the slash
controls the other? Maybe. But on the other hand, maybe it's better to
have separate connection options for client compression and server
compression. Or, maybe the kind of compression used by the server
should be controlled via a GUC rather than a connection option. Or,
maybe none of that is right and we should stick with the approach the
patch currently takes. But it's not like we can do something for v1
and then just change things randomly later: there will be
backward-compatibility to worry about. So the time to talk about the
general approach here is now, before anything gets committed, before
the project has committed itself to any particular design. If we
decide in that discussion that certain things can be left for the
future, that's fine. If we've have discussed how they could be added
without breaking backward compatibility, even better. But we can't
just skip over having that discussion.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2020-11-24 18:36:33 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
Previous Message Tom Lane 2020-11-24 18:33:30 Re: mark/restore failures on unsorted merge joins