Quick Links

Re: pglz performance

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject:	Re: pglz performance
Date:	2019-05-13 07:14:27
Message-ID:	20190513071427.GB2273@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch[0] and noticed that there
> is a big room for improvement in performance of pglz compression and
> decompression.

Yes, I believe so too. pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.

> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit[1] to investigate performance of
> compression and decompression. This is and extension with single
> function test_pglz() which performs tests for different:
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations

Cool. I got something rather similar in my wallet of plugins:
https://github.com/michaelpq/pg_plugins/tree/master/compress_test
This is something I worked on mainly for FPW compression in WAL.

> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.

Text strings made of random data and variable length? For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier. Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.

> My laptop tests show that our decompression implementation [2] can
> be from 15% to 50% faster. Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.

That's nice.

> We focus only on boosting existing codec without any considerations
> of other compression algorithms.

There is this as well. A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz. A 10%~20% improvement is something worth doing.

> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?

Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked. Any improvement in one or the
other, or even both, is nice to have.
--
Michael

In response to

pglz performance at 2019-05-13 02:45:59 from Andrey Borodin

Responses

Re: pglz performance at 2019-05-15 10:06:22 from Andrey Borodin
Re: pglz performance at 2019-06-27 18:33:16 from Andrey Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	张连壮	2019-05-13 07:30:54	pg_stat_database update stats_reset only by pg_stat_reset
Previous Message	Oleg Bartunov	2019-05-13 07:08:57	Re: PG 12 draft release notes