Re: pglz compression performance, take two

From: Andrey Borodin <amborodin86(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pglz compression performance, take two
Date: 2023-02-06 02:00:20
Message-ID: CAAhFRxgE6fOGL0WYv+pNFbRq0MKTsmuRVsSfL5iymhV_L39aMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 5, 2023 at 5:51 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 2/5/23 19:36, Andrey Borodin wrote:
> > On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin <amborodin86(at)gmail(dot)com> wrote:
> >>
> >> Hello! Please find attached v8.
> >
> > I got some interesting feedback from some patch users.
> > There was an oversight that frequently yielded results that are 1,2 or
> > 3 bytes longer than expected.
> > Looking closer I found that the correctness of the last 3-byte tail is
> > checked in two places. PFA fix for this. Previously compressed data
> > was correct, however in some cases few bytes longer than the result of
> > current pglz implementation.
> >
>
> Thanks. What were the consequences of the issue? Lower compression
> ratio, or did we then fail to decompress the data (or would current pglz
> implementation fail to decompress it)?
>
The data was decompressed fine. But extension tests (Citus's columnar
engine) hard-coded a lot of compression ratio stuff.
And there is still 1 more test where optimized version produces 1 byte
longer output. I'm trying to find it, but with no success yet.

There are known and documented cases when optimized pglz version would
do so. good_match without 10-division and memcmp by 4 bytes. But even
disabling this, still observing 1-byte longer compression results
persists... The problem is the length is changed after deleting some
data, so compression of that particular sequence seems to be somewhere
far away.
It was funny at the beginning - to hunt for 1 byte. But the weekend is
ending, and it seems that byte slipped from me again...

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-02-06 02:30:15 Re: File descriptors in exec'd subprocesses
Previous Message Tomas Vondra 2023-02-06 01:51:27 Re: pglz compression performance, take two