Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-02-10 04:48:08
Message-ID: CAFiTN-v=cXD8ntnVhQUnNspFZ8ZmeCxwyoiMNONmLbh-G1vmMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 10, 2021 at 1:42 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Please remember to trim unnecessary quoted material.

Okay, I will.

> On Sun, Feb 7, 2021 at 6:45 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > [ a whole lot of quoted stuff ]
> >
> > Conclusion:
> > 1. In most cases lz4 is faster and doing better compression as well.
> > 2. In Test2 when small data is incompressible then lz4 tries to
> > compress whereas pglz doesn't try so there is some performance loss.
> > But if we want we can fix
> > it by setting some minimum limit of size for lz4 as well, maybe the
> > same size as pglz?
>
> So my conclusion here is that perhaps there's no real problem. It
> looks like externalizing is so expensive compared to compression that
> it's worth trying to compress even though it may not always pay off.
> If, by trying to compress, we avoid externalizing, it's a huge win
> (~5x). If we try to compress and don't manage to avoid externalizing,
> it's a small loss (~6%). It's probably reasonable to expect that
> compressible data is more common than incompressible data, so not only
> is the win a lot bigger than the loss, but we should be able to expect
> it to happen a lot more often. It's not impossible that somebody could
> get bitten, but it doesn't feel like a huge risk to me.

I agree with this. That said maybe we could test the performance of
pglz also by lowering/removing the min compression limit but maybe
that should be an independent change.

> One thing that does occur to me is that it might be a good idea to
> skip compression if it doesn't change the number of chunks that will
> be stored into the TOAST table. If we compress the value but still
> need to externalize it, and the compression didn't save enough to
> reduce the number of chunks, I suppose we ideally would externalize
> the uncompressed version. That would save decompression time later,
> without really costing anything. However, I suppose that would be a
> separate improvement from this patch.

Yeah, this seems like a good idea and we can work on that in a different thread.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2021-02-10 04:55:09 Re: TRUNCATE on foreign table
Previous Message Kyotaro Horiguchi 2021-02-10 04:44:12 Re: pg_cryptohash_final possible out-of-bounds access (per Coverity)