Re: pglz performance

From: Tels <nospam-pg-abuse(at)bloodgate(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Gasper Zejn <zejn(at)owca(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pglz performance
Date: 2019-11-03 09:24:43
Message-ID: d56c85b989a3bd8c0a98d79553276b0e@bloodgate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Andrey,

On 2019-11-02 12:30, Andrey Borodin wrote:
>> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
>> написал(а):
> PFA two patches:
> v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in
> test_pglz extension)
> v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known
> as 'hacked8')

Looking at the patches, it seems only the case of a match is changed.
But when we observe a literal byte, this is copied byte-by-byte with:

else
{
* An unset control bit means LITERAL BYTE. So we just
* copy one from INPUT to OUTPUT.
*/
*dp++ = *sp++;
}

Maybe we can optimize this, too. For instance, you could just increase a
counter:

else
{
/*
* An unset control bit means LITERAL BYTE. We count
* these and copy them later.
*/
literal_bytes ++;
}

and in the case of:

if (ctrl & 1)
{
/* First copy all the literal bytes */
if (literal_bytes > 0)
{
memcpy( sp, dp, literal_bytes);
sp += literal_bytes;
dp += literal_bytes;
literal_bytes = 0;
}

(Code untested!)

The same would need to be done at the very end, if the input ends
without any new CTRL-byte.

Wether that gains us anything depends on how common literal bytes are.
It might be that highly compressible input has almost none, while input
that is a mix of incompressible strings and compressible ones might have
longer stretches. One example would be something like an SHA-256, that
is repeated twice. The first instance would be incompressible, the
second one would be just a copy. This might not happens that often in
practical inputs, though.

I wonder if you agree and what would happen if you try this variant on
your corpus tests.

Best regards,

Tels

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Павел Ерёмин 2019-11-03 11:17:15 Re: 64 bit transaction id
Previous Message Gilles Darold 2019-11-03 08:12:38 Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.