Re: pgsql: Optimize pglz compressor for small inputs.

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgsql: Optimize pglz compressor for small inputs.
Date: 2013-07-17 17:36:44
Message-ID: 51E6D62C.7060302@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 14.07.2013 20:12, Stephen Frost wrote:
> * Heikki Linnakangas (heikki(dot)linnakangas(at)iki(dot)fi) wrote:
>> This patch alleviates that in two ways. First, instead of storing pointers
>> in the hash table, store 16-bit indexes into the hist_entries array. That
>> slashes the size of the hash table to 1/2 or 1/4 of the original, depending
>> on the pointer width. Secondly, adjust the size of the hash table based on
>> input size. For very small inputs, you don't need a large hash table to
>> avoid collisions.
>
> The coverity scanner has a bit of an issue with this patch which, at
> least on first blush, looks like a valid concern.
>
> While the change in pg_lzcompress.c:pglz_find_match() to loop on:
>
> while (hent != INVALID_ENTRY_PTR)
> {
> const char *ip = input;
> const char *hp = hent->pos;
>
> looks good, and INVALID_ENTRY_PTR is the address of the first entry in
> the array (and can't be NULL), towards the end of the loop we do:
>
> hent = hent->next;
> if (hent)
> ...
>
> Should we really be checking for 'hent != INVALID_ENTRY_PTR' here? If
> not, and hent really can end up as NULL, then we're going to segfault
> on the next loop due to the unchecked 'hent->pos' early in the loop.
> If hent can never be NULL, then we probably don't need this check at
> all.

hent can never be NULL, the code should indeed check for 'hent !=
INVALID_ENTRY_PTR'. The check is not required from a correctness point
of view, the idea is just to avoid calculating the 'good_match'
variable, if you're going to fall out of the loop anyway.

I'm actually a bit surprised the compiler doesn't optimize it that way
anyway, without the explicit if-block, but at least "gcc -O2" (version
4.7.3) seem to do that. So, I guess we should keep the check.

Committed '(hent)' -> '(hent != INVALID_ENTRY_PTR)'. Thanks for the report!

- Heikki

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-07-17 17:37:34 pgsql: Fix end-of-loop optimization in pglz_find_match() function.
Previous Message Fujii Masao 2013-07-17 16:24:26 pgsql: Fix typo in previous pgbench --progress patch.

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-07-17 17:43:43 Re: pg_filedump 9.3: checksums (and a few other fixes)
Previous Message Fujii Masao 2013-07-17 17:31:07 Re: Fix pgstattuple/pgstatindex to use regclass-type as the argument