Re: pglz performance

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-04 00:41:24
Message-ID: d8576096-76ba-487d-515b-44fdedba8bb5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 02/08/2019 21:48, Tomas Vondra wrote:
> On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:
>
>>
>>> Another question is whether we'd actually want to include the code in
>>> core directly, or use system libraries (and if some packagers might
>>> decide to disable that, for whatever reason).
>>
>> I'd personally say we should have an included version, and a
>> --with-system-... flag that uses the system one.
>>
>
> OK. I'd say to require a system library, but that's a minor detail.
>

Same here.

Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option
is actually --without-lz4) that enables the lz4, it's using system library
- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the
algorithm kind, that leaves us with 254 more to add :) (that's an extra
overhead compared to the current state)
- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with
different algorithm (so that the GUC itself can be freely changed)

Simple docs and a TAP test included.

I did some basic performance testing (it's not really my thing though,
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz
on my dataset but strangely only tiny decompression improvement. Perhaps
more importantly I also did before patch and after patch tests with pglz
and the performance difference with my data set was <1%.

Note that this will just link against lz4, it does not add lz4 into
PostgreSQL code-base.

The issues I know of:
- the pg_decompress function really ought to throw error in the default
branch but that file is also used in front-end so not sure how to do that
- the TAP test probably does not work with all possible configurations
(but that's why it needs to be set in PG_TEST_EXTRA like for example ssl)
- we don't really have any automated test for reading old TOAST format,
no idea how to do that
- I expect my changes to configure.in are not the greatest as I don't
have pretty much zero experience with autoconf

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Attachment Content-Type Size
0001-Add-new-GUC-compression_algorithm.patch text/x-patch 34.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-04 02:20:04 Re: More refactoring for BuildIndexInfo
Previous Message Tom Lane 2019-08-03 23:14:13 Re: Redacting information from logs