Re: libpq compression

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robbie Harwood <rharwood(at)redhat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq compression
Date: 2018-06-19 07:54:38
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18.06.2018 23:34, Robbie Harwood wrote:
> t
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>> On 06.06.2018 02:03, Thomas Munro wrote:
>>> On Wed, Jun 6, 2018 at 2:06 AM, Konstantin Knizhnik
>>> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>>> Thank you for review. Updated version of the patch fixing all reported
>>>> problems is attached.
>>> Small problem on Windows[1]:
>>> C:\projects\postgresql\src\include\common/zpq_stream.h(17): error
>>> C2143: syntax error : missing ')' before '*'
>>> [C:\projects\postgresql\libpq.vcxproj]
>>> 2395
>>> You used ssize_t in zpq_stream.h, but Windows doesn't have that type.
>>> We have our own typedef in win32_port.h. Perhaps zpq_stream.c should
>>> include postgres.h/postgres_fe.h (depending on FRONTEND) like the
>>> other .c files in src/common, before it includes zpq_stream.h?
>>> Instead of "c.h".
>>> [1]
>> Thank you very much for reporting the problem.
>> I attached new patch with include of postgres_fe.h added to zpq_stream.c
> Hello!
> Due to being in a similar place, I'm offering some code review. I'm
> excited that you're looking at throughput on the network stack - it's
> not usually what we think of in database performance. Here are some
> first thoughts, which have some overlap with what others have said on
> this thread already:
> ###
> This build still doesn't pass Windows:
> You can find more about what the bot is doing here:
Thank you.
Looks like I found the reason: has to be patched.

> ###
> I have a few misgivings about pq_configure(), starting with the name.
> The *only* thing this function does is set up compression, so it's
> mis-named. (There's no point in making something generic unless it's
> needed - it's just confusing.)
Well, my intention was that this function *may* in future perform some
other configuration setting not related with compression.
And it is better to encapsulate this knowledge in pqcomm, rather than
make  postmaster (BackendStartup) worry about it.
But I can rename this function to pq_cofigure_compression() or whatever
your prefer.

> I also don't like that you've injected into the *startup* path - before
> authentication takes place. Fundamentally, authentication (if it
> happens) consists of exchanging some combination of short strings (e.g.,
> usernames) and binary blobs (e.g., keys). None of this will compress
> well, so I don't see it as worth performing this negotiation there - it
> can wait. It's also another message in every startup. I'd leave it to
> connection parameters, personally, but up to you.

From my point of view compression of libpq traffic is similar with SSL
and should be toggled at the same stage.
Definitely authentication parameter are not so large to be efficiently
compressed, by compression (may be in future password protected) can
somehow protect this data.
In any case I do not think that compression of authentication data may
have any influence on negotiation speed.
So I am not 100% sure that toggling compression just after receiving
startup package is the only right solution.
But I am not also convinced in that there is some better place where
compressor should be configured.
Do you have some concrete suggestions for it? In InitPostgres just after
PerformAuthentication ?
Also please notice that compression is useful not only for client-server
communication, but also for replication channels.
Right now it is definitely used in both cases, but if we move
pq_configure somewhere else, we should check that this code is invoked
in both for normal backends and walsender.

> ###
> Documentation! You're going to need it. There needs to be enough
> around for other people to implement the protocol (or if you prefer,
> enough for us to debug the protocol as it exists).
> In conjunction with that, please add information on how to set up
> compressed vs. uncompressed connections - similarly to how we've
> documentation on setting up TLS connection (though presumably compressed
> connection documentation will be shorter).

Sorry, definitely I will add documentation for configuring compression.

> ###
> Using terminology from :
> Right now you use streaming (ZSTD_{compress,decompress}Stream()) as the
> basis for your API. I think this is probably a mismatch for what we're
> doing here - we're doing one-shot compression/decompression of packets,
> not sending video or something.
> I think our use case is better served by the non-streaming interface, or
> what they call the "Simple API" (ZSTD_{decompress,compress}()).
> Documentation suggests it may be worth it to keep an explicit context
> around and use that interface instead (i.e.,
> ZSTD_{compressCCTx,decompressDCtx}()), but that's something you'll have
> to figure out.
> You may find making this change helpful for addressing the next issue.

Sorry, but here I completely disagree with you.
What we are doing is really streaming compression, not compression of
individual messages/packages.
Yes, it is not a video, but actually COPY data has the same nature as
video data.
The main benefit of streaming compression is that we can use the same
dictionary for compressing all messages (and adjust this dictionary
based on new data).
We do not need to write dictionary and separate header for each record.
Otherwize compression of libpq messages will be completely useless:
typical size of message is too short to be efficiently compressed. The
main drawback of streaming compression is that you can not decompress
some particular message without decompression of all previous messages.
This is why streaming compression can not be used to compress database
pages  (as it is done in CFS, provided in PostgresPro EE). But for libpq
it is no needed.

> ###
> I don't like where you've put the entry points to the compression logic:
> it's a layering violation. A couple people have expressed similar
> reservations I think, so let me see if I can explain using
> `pqsecure_read()` as an example. In pseudocode, `pqsecure_read()` looks
> like this:
> if conn->is_tls:
> n = tls_read(conn, ptr, len)
> else:
> n = pqsecure_raw_read(conn, ptr, len)
> return n
> I want to see this extended by your code to something like:
> if conn->is_tls:
> n = tls_read(conn, ptr, len)
> else:
> n = pqsecure_raw_read(conn, ptr, len)
> if conn->is_compressed:
> n = decompress(ptr, n)
> return n
> In conjunction with the above change, this should also significantly
> reduce the size of the patch (I think).

Yes, it will simplify patch. But make libpq compression completely
useless (see my explanation above).
We need to use streaming compression, and to be able to use streaming
compression I have to pass function for fetching more data to
compression library.

> ###
> The compression flag has proven somewhat contentious, as you've already
> seen on this thread. I recommend removing it for now and putting it in
> a separate patch to be merged later, since it's separable.
> ###
> It's not worth flagging style violations in your code right now, but you
> should be aware that there are quite a few style and whitespace
> problems. In particular, please be sure that you're using hard tabs
> when appropriate, and that line lengths are fewer than 80 characters
> (unless they contain error messages), and that pointers are correctly
> declared (`void *arg`, not `void* arg`).
> ###

Ok, I will fix it.

> Thanks,
> --Robbie

Konstantin Knizhnik
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
libpq-compression-6.patch text/x-patch 37.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-06-19 07:54:45 Re: Partitioning with temp tables is broken
Previous Message Dent John 2018-06-19 07:46:06 Re: Query Rewrite for Materialized Views (Postgres Extension)