Re: Different compression methods for FPI

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Different compression methods for FPI
Date: 2021-06-01 02:06:53
Message-ID: YLWWPaq/KnVS24J4@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 31, 2021 at 12:33:44PM +0500, Andrey Borodin wrote:
> Would it make sense to run our own benchmarks?

Yes, I think that it could be a good idea to run some custom-made
benchmarks as that could mean different bottlenecks found when it
comes to PG.

There are a couple of factors that matter here:
- Is the algo available across a maximum of platforms? ZLIB and LZ4
are everywhere and popular, for one. And we already plug with them in
the builds. No idea about the others but I can see quickly that Zstd
has support across many systems, and has a compatible license.
- Speed and CPU usage. We should worry about that for CPU-bounded
environments.
- Compression ratio, which is just monitoring the difference in WAL.
- Effect of the level of compression perhaps?
- Use a fixed amount of WAL generated, meaning a set of repeatable SQL
queries, with one backend, no benchmarks like pgbench.
- Avoid any I/O bottleneck, so run tests on a tmpfs or ramfs.
- Avoid any extra WAL interference, like checkpoints, no autovacuum
running in parallel.

It is not easy to draw a straight line here, but one could easily say
that an algo that reduces a FPI by 90% costing two times more CPU
cycles is worse than something doing only a 70%~75% compression for
two times less CPU cycles if environments are easily constrained on
CPU.

As mentioned upthread, I'd recomment to design tests like this one, or
just reuse this one:
https://www.postgresql.org/message-id/CAB7nPqSc97o-UE5paxfMUKWcxE_JioyxO1M4A0pMnmYqAnec2g@mail.gmail.com

In terms of CPU usage, we should also monitor the user and system
times of the backend, and compare the various situations. See patch
0003 posted here that we used for wal_compression:
https://www.postgresql.org/message-idCAB7nPqRC20=mKgu6d2st-e11_QqqbreZg-=SF+_UYsmvwNu42g(at)mail(dot)gmail(dot)com

This just uses getrusage() to get more stats.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-01 03:04:46 Re: Assertion failure while streaming toasted data
Previous Message David G. Johnston 2021-06-01 01:48:07 Re: CALL versus procedures with output-only arguments