Re: pglz performance

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-05-13 07:14:27
Message-ID: 20190513071427.GB2273@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.

Yes, I believe so too. pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.

> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression. This is and extension with single
> function test_pglz() which performs tests for different:
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations

Cool. I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.

> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.

Text strings made of random data and variable length? For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier. Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.

> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster. Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.

That's nice.

> We focus only on boosting existing codec without any considerations
> of other compression algorithms.

There is this as well. A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz. A 10%~20% improvement is something worth doing.

> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked. Any improvement in one or the
other, or even both, is nice to have.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 张连壮 2019-05-13 07:30:54 pg_stat_database update stats_reset only by pg_stat_reset
Previous Message Oleg Bartunov 2019-05-13 07:08:57 Re: PG 12 draft release notes