|From:||Oleg Bartunov <obartunov(at)gmail(dot)com>|
|To:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|Cc:||Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>|
|Subject:||Re: [HACKERS] Custom compression methods|
|Views:||Raw Message | Whole Thread | Download mbox|
On Sat, Dec 2, 2017 at 6:04 PM, Tomas Vondra
> On 12/01/2017 10:52 PM, Andres Freund wrote:
>> On 2017-12-01 16:14:58 -0500, Robert Haas wrote:
>>> Honestly, if we can give everybody a 4% space reduction by
>>> switching to lz4, I think that's totally worth doing -- but let's
>>> not make people choose it, let's make it the default going forward,
>>> and keep pglz support around so we don't break pg_upgrade
>>> compatibility (and so people can continue to choose it if for some
>>> reason it works better in their use case). That kind of improvement
>>> is nothing special in a specific workload, but TOAST is a pretty
>>> general-purpose mechanism. I have become, through a few bitter
>>> experiences, a strong believer in the value of trying to reduce our
>>> on-disk footprint, and knocking 4% off the size of every TOAST
>>> table in the world does not sound worthless to me -- even though
>>> context-aware compression can doubtless do a lot better.
>> +1. It's also a lot faster, and I've seen way way to many workloads
>> with 50%+ time spent in pglz.
> TBH the 4% figure is something I mostly made up (I'm fake news!). On the
> mailing list archive (which I believe is pretty compressible) I observed
> something like 2.5% size reduction with lz4 compared to pglz, at least
> with the compression levels I've used ...
Nikita Glukhove tested compression on real json data:
Delicious bookmarks (js):
jsonbc 931MB 1.5x
jsonb+lz4d 404MB 3.4x
Citus customer reviews (jr):
jsonbc 622MB 2.5x
jsonb+lz4d 601MB 2.5x
I also attached a plot with wired tiger size (zstd compression) in Mongodb.
Nikita has more numbers about compression.
> Other algorithms (e.g. zstd) got significantly better compression (25%)
> compared to pglz, but in exchange for longer compression. I'm sure we
> could lower compression level to make it faster, but that will of course
> hurt the compression ratio.
> I don't think switching to a different compression algorithm is a way
> forward - it was proposed and explored repeatedly in the past, and every
> time it failed for a number of reasons, most of which are still valid.
> Firstly, it's going to be quite hard (or perhaps impossible) to find an
> algorithm that is "universally better" than pglz. Some algorithms do
> work better for text documents, some for binary blobs, etc. I don't
> think there's a win-win option.
> Sure, there are workloads where pglz performs poorly (I've seen such
> cases too), but IMHO that's more an argument for the custom compression
> method approach. pglz gives you good default compression in most cases,
> and you can change it for columns where it matters, and where a
> different space/time trade-off makes sense.
> Secondly, all the previous attempts ran into some legal issues, i.e.
> licensing and/or patents. Maybe the situation changed since then (no
> idea, haven't looked into that), but in the past the "pluggable"
> approach was proposed as a way to address this.
I don't think so. Pluggable means that now we have more data types,
which don't fit to
the old compression scheme of TOAST and we need better flexibility. I
see in future we
could avoid decompression of the whole toast just to get on key from
document, so we
first slice data and compress each slice separately.
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Screen Shot 2017-12-03 at 11.46.14.png||image/png||183.0 KB|
|Next Message||chenhj||2017-12-03 03:27:57||Re:Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files|
|Previous Message||Jeff Janes||2017-12-03 01:27:51||Re: Bitmap scan is undercosted?|