Re: pglz performance

From: Oleg Bartunov <obartunov(at)postgrespro(dot)ru>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Gasper Zejn <zejn(at)owca(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pglz performance
Date: 2019-09-15 10:57:48
Message-ID: CAF4Au4zgAWfhXvm_8gQ2denbCB2wE34oJ+uj1kFrZ1a9+BaiNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 4, 2019 at 12:22 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> Hi, Peter! Thanks for looking into this.
>
> > 4 сент. 2019 г., в 14:09, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> написал(а):
> >
> > On 2019-06-24 10:44, Andrey Borodin wrote:
> >>> 18 мая 2019 г., в 11:44, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
> >>>
> >> Hi!
> >> Here's rebased version of patches.
> >>
> >> Best regards, Andrey Borodin.
> >
> > I think this is the most recent patch for the CF entry
> > <https://commitfest.postgresql.org/24/2119/>.
> >
> > What about the two patches? Which one is better?
> On our observations pglz_decompress_hacked.patch is best for most of tested platforms.
> Difference is that pglz_decompress_hacked8.patch will not appply optimization if decompressed match is not greater than 8 bytes. This optimization was suggested by Tom, that's why we benchmarked it specifically.
>
> > Have you also considered using memmove() to deal with the overlap issue?
> Yes, memmove() resolves ambiguity of copying overlapping regions in a way that is not compatible with pglz. In proposed patch we never copy overlapping regions.
>
> > Benchmarks have been posted in this thread. Where is the benchmarking
> > tool? Should we include that in the source somehow?
>
> Benchmarking tool is here [0]. Well, code of the benchmarking tool do not adhere to our standards in some places, we did not consider its inclusion in core.
> However, most questionable part of benchmarking is choice of test data. It's about 100Mb of useless WALs, datafile and valuable Shakespeare writings.

Why not use 'Silesia compression corpus'
(http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia), which used by
lzbench (https://github.com/inikep/lzbench) ? I and Teodor remember
that testing on non-english texts could be very important.

>
> Best regards, Andrey Borodin.
>
>
> [0] https://github.com/x4m/test_pglz
>
>
>

--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-09-15 11:20:25 Re: BF failure: could not open relation with OID XXXX while querying pg_views
Previous Message Oleg Bartunov 2019-09-15 10:47:15 Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.