Re: alternative compression algorithms?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alternative compression algorithms?
Date: 2015-04-30 01:12:10
Message-ID: 5541816A.1000303@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/30/15 02:42, Robert Haas wrote:
> On Wed, Apr 29, 2015 at 6:55 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I'm not convinced not compressing the data is a good idea - it suspect it
>> would only move the time to TOAST, increase memory pressure (in general and
>> in shared buffers). But I think that using a more efficient compression
>> algorithm would help a lot.
>>
>> For example, when profiling the multivariate stats patch (with multiple
>> quite large histograms), the pglz_decompress is #1 in the profile, occupying
>> more than 30% of the time. After replacing it with the lz4, the data are bit
>> larger, but it drops to ~0.25% in the profile and planning the drops
>> proportionally.
>
> That seems to imply a >100x improvement in decompression speed. Really???

Sorry, that was a bit misleading over-statement. The profiles (same
dataset, same workload) look like this:

pglz_decompress
---------------
44.51% postgres [.] pglz_decompress
13.60% postgres [.] update_match_bitmap_histogram
8.40% postgres [.] float8_cmp_internal
7.43% postgres [.] float8lt
6.49% postgres [.] deserialize_mv_histogram
6.23% postgres [.] FunctionCall2Coll
4.06% postgres [.] DatumGetFloat8
3.48% libc-2.18.so [.] __isnan
1.26% postgres [.] clauselist_mv_selectivity
1.09% libc-2.18.so [.] __memcpy_sse2_unaligned

lz4
---
18.05% postgres [.] update_match_bitmap_histogram
11.67% postgres [.] float8_cmp_internal
10.53% postgres [.] float8lt
8.67% postgres [.] FunctionCall2Coll
8.52% postgres [.] deserialize_mv_histogram
5.52% postgres [.] DatumGetFloat8
4.90% libc-2.18.so [.] __isnan
3.92% liblz4.so.1.6.0 [.] 0x0000000000002603
2.08% liblz4.so.1.6.0 [.] 0x0000000000002847
1.81% postgres [.] clauselist_mv_selectivity
1.47% libc-2.18.so [.] __memcpy_sse2_unaligned
1.33% liblz4.so.1.6.0 [.] 0x000000000000260f
1.16% liblz4.so.1.6.0 [.] 0x00000000000025e3
(and then a long tail of other lz4 calls)

The difference used to more significant, but I've done a lot of
improvements in the update_match_bitmap method (so the lz4 methods are
more significant).

The whole script (doing a lot of estimates) takes 1:50 with pglz and
only 1:25 with lz4. That's ~25-30% improvement.

The results are slightly unreliable because collected in a Xen VM, and
the overhead is non-negligible (but the same in both cases). I wouldn't
be surprised if the difference was more significant without the VM.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-04-30 02:52:45 Re: pg_upgrade: quote directory names in delete_old_cluster script
Previous Message Robert Haas 2015-04-30 00:48:33 Re: Additional role attributes && superuser review