Re: pglz performance

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Gasper Zejn <zejn(at)owca(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pglz performance
Date: 2019-11-02 11:30:22
Message-ID: 27F232D7-25C5-4D12-AFA0-50EEBAB98E9C@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 1 нояб. 2019 г., в 18:48, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> написал(а):
>
> On 2019-Nov-01, Peter Eisentraut wrote:
>
>> On 2019-10-25 07:05, Andrey Borodin wrote:
>>>> 21 окт. 2019 г., в 14:09, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>>>>
>>>> With Silesian corpus pglz_decompress_hacked is actually decreasing performance on high-entropy data.
>>>> Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
>>>> In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer option.
>>>
>>> Here's v3 which takes into account recent benchmarks with Silesian Corpus and have better comments.
>>
>> Your message from 21 October appears to say that this change makes the
>> performance worse. So I don't know how to proceed with this.
>
> As I understand that report, in these results "less is better", so the
> hacked8 variant shows better performance (33.8) than current (42.5).
> The "hacked" variant shows worse performance (48.2) that the current
> code.
This is correct. Thanks, Álvaro.

> The "in spite" phrase seems to have been a mistake.
Yes. Sorry, I actually thought that "in spite" is a contradiction of "despite" and means "In view of".

> I am surprised that there is so much variability in the performance
> numbers, though, based on such small tweaks of the code.
Silesian Corpus is very different from WALs and PG data files. Data files are rich in long sequences of same byte. This sequences are long, thus unrolled very effectively by memcpy method.
But Silesian corpus is rich in short matches of few bytes.

> 1 нояб. 2019 г., в 19:59, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
> I'd try running the benchmarks to verify the numbers, and maybe do some
> additional tests, but it's not clear to me which patches should I use.
Cool, thanks!

> I think the last patches with 'hacked' and 'hacked8' in the name are a
> couple of months old, and the recent posts attach just a single patch.
> Andrey, can you post current versions of both patches?
PFA two patches:
v4-0001-Use-memcpy-in-pglz-decompression.patch (known as 'hacked' in test_pglz extension)
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch (known as 'hacked8')

Best regards, Andrey Borodin.

Attachment Content-Type Size
v4-0001-Use-memcpy-in-pglz-decompression.patch application/octet-stream 3.2 KB
v4-0001-Use-memcpy-in-pglz-decompression-for-long-matches.patch application/octet-stream 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2019-11-02 12:24:37 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Adrien Nayrat 2019-11-02 09:23:49 Re: Adding percentile metrics to pg_stat_statements module