|From:||Michael Paquier <michael(at)paquier(dot)xyz>|
|To:||Andrey Borodin <x4mmm(at)yandex-team(dot)ru>|
|Cc:||pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>|
|Subject:||Re: pglz performance|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Mon, May 13, 2019 at 07:45:59AM +0500, Andrey Borodin wrote:
> I was reviewing Paul Ramsey's TOAST patch and noticed that there
> is a big room for improvement in performance of pglz compression and
Yes, I believe so too. pglz is a huge CPU-consumer when it comes to
compilation compared to more modern algos like lz4.
> With Vladimir we started to investigate ways to boost byte copying
> and eventually created test suit to investigate performance of
> compression and decompression. This is and extension with single
> function test_pglz() which performs tests for different:
> 1. Data payloads
> 2. Compression implementations
> 3. Decompression implementations
Cool. I got something rather similar in my wallet of plugins:
This is something I worked on mainly for FPW compression in WAL.
> Currently we test mostly decompression improvements against two WALs
> and one data file taken from pgbench-generated database. Any
> suggestion on more relevant data payloads are very welcome.
Text strings made of random data and variable length? For any test of
this kind I think that it is good to focus on the performance of the
low-level calls, even going as far as a simple C wrapper on top of the
pglz APIs to test only the performance and not have extra PG-related
overhead like palloc() which can be a barrier. Focusing on strings of
lengths of 1kB up to 16kB may be an idea of size, and it is important
to keep the same uncompressed strings for performance comparison.
> My laptop tests show that our decompression implementation  can
> be from 15% to 50% faster. Also I've noted that compression is
> extremely slow, ~30 times slower than decompression. I believe we
> can do something about it.
> We focus only on boosting existing codec without any considerations
> of other compression algorithms.
There is this as well. A couple of algorithms have a license
compatible with Postgres, but it may be more simple to just improve
pglz. A 10%~20% improvement is something worth doing.
> Most important questions are:
> 1. What are relevant data sets?
> 2. What are relevant CPUs? I have only XEON-based servers and few
> laptops\desktops with intel CPUs
> 3. If compression is 30 times slower, should we better focus on
> compression instead of decompression?
Decompression can matter a lot for mostly-read workloads and
compression can become a bottleneck for heavy-insert loads, so
improving compression or decompression should be two separate
problems, not two problems linked. Any improvement in one or the
other, or even both, is nice to have.
|Next Message||张连壮||2019-05-13 07:30:54||pg_stat_database update stats_reset only by pg_stat_reset|
|Previous Message||Oleg Bartunov||2019-05-13 07:08:57||Re: PG 12 draft release notes|