Re: libpq compression

From: Daniil Zakhlystov <usernamedt(at)yandex-team(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Subject: Re: libpq compression
Date: 2021-03-18 19:30:09
Message-ID: 161609580905.28624.5304095609680400810.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Hi,

I've compared the different libpq compression approaches in the streaming physical replication scenario.

Test setup
Three hosts: first is used for pg_restore run, second is master, third is the standby replica.
In each test run, I've run the pg_restore of the IMDB database (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2QYZBT)
and measured the received traffic on the standby replica.

Also, I've enlarged the ZPQ_BUFFER_SIZE buffer in all versions because too small buffer size (8192 bytes) lead to more
system calls to socket read/write and poor compression in the chunked-reset scenario.

Scenarios:

chunked
use streaming compression, wrap compressed data into CompressedData messages and preserve the compression context between multiple CompressedData messages.
https://github.com/usernamedt/libpq_compression/tree/chunked-compression

chunked-reset
use streaming compression, wrap compressed data into CompressedData messages and reset the compression context on each CompressedData message.
https://github.com/usernamedt/libpq_compression/tree/chunked-reset

permanent
use streaming compression, send raw compressed stream without any wrapping
https://github.com/usernamedt/libpq_compression/tree/permanent-w-enlarged-buffer

Tested compression levels
ZSTD, level 1
ZSTD, level 5
ZSTD, level 9

Scenario Replica rx, mean, MB
uncompressed 6683.6

ZSTD, level 1
Scenario Replica rx, mean, MB
chunked-reset 2726
chunked 2694
permanent 2694.3

ZSTD, level 5
Scenario Replica rx, mean, MB
chunked-reset 2234.3
chunked 2123
permanent 2115.3

ZSTD, level 9
Scenario Replica rx, mean, MB
chunked-reset 2153.6
chunked 1943
permanent 1941.6

Full report with additional data and resource usage graphs is available here
https://docs.google.com/document/d/1a5bj0jhtFMWRKQqwu9ag1PgDF5fLo7Ayrw3Uh53VEbs

Based on these results, I suggest sticking with chunked compression approach
which introduces more flexibility and contains almost no overhead compared to permanent compression.
Also, later we may introduce some setting to control should we reset the compression context in each message without
breaking the backward compatibility.

--
Daniil Zakhlystov

The new status of this patch is: Ready for Committer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-18 19:57:21 Re: [HACKERS] Custom compression methods
Previous Message Tomas Vondra 2021-03-18 19:27:40 Re: GROUP BY DISTINCT