Re: pglz compression performance, take two

From: Andrey Borodin <amborodin86(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pglz compression performance, take two
Date: 2022-11-27 18:43:54
Message-ID: CAAhFRxhFvuO-yUu8VuT3k6jKeJUjzA0ff3rJS6qtPLHdq3Epzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

On Sun, Nov 27, 2022 at 8:02 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> 1) For PGLZ_HISTORY_SIZE it uses literal 0x0fff, with the explanation:
>
> /* to avoid compare in iteration */
>
> which I think means intent to use this value as a bit mask, but then the
> only place using PGLZ_HISTORY_SIZE does
>
> if (hist_next == PGLZ_HISTORY_SIZE) ...
>
> i.e. a comparison. Maybe I misunderstand the comment, though.
>

As far as I recollect, it's a leftover from an attempt to optimize the
code into branchless version
I.e. instead of
if(hist_next>=PGLZ_HISTORY_SIZE)
hist_next = 1;
use something like hist_next = hist_next & PGLZ_HISTORY_SIZE.
But the optimization did not show any measurable impact and was
improperly rolled back.

>
> 2) PGLZ_HistEntry was modified and replaces links (pointers) with
> indexes, but the comments still talk about "links", so maybe that needs
> to be changed.

The offsets still form a "linked list"... however I removed some
mentions of pointers, since they are not pointers anymore.

> Also, I wonder why next_id is int16 while hist_idx is
> uint16 (and also why id vs. idx)?

+1. I'd call them next and hash.

int16 next; /* instead of next_d */
uint16 hash; /* instead of hist_idx */

What do you think?
hist_idx comes from the function name... I'm not sure how far renaming
should go here.

>
> 3) minor formatting of comments
>
> 4) the comment in pglz_find_match about traversing the history seems too
> early - it's before handling invalid entries and cleanup, but it does
> not talk about that at all, and the actual while loop is after that.

Yes, this seems right for me.

PFA review fixes (step 1 is unchanged).
I did not include next_id->next and hist_idx -> hash rename.

Thank you!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v7-0002-Review-fixes.patch application/octet-stream 3.6 KB
v7-0001-Reorganize-pglz-compression-code.patch application/octet-stream 20.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-11-27 18:56:54 Re: Use fadvise in wal replay
Previous Message Peter Geoghegan 2022-11-27 18:34:28 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation