Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Date: 2021-08-16 17:06:31
Message-ID: CAEudQAro-n+HG+=YCjgVcKQ095OXzjPimo9L1YWjdT5EFdVNMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em seg., 16 de ago. de 2021 às 13:19, Hans Buschmann <buschmann(at)nidsa(dot)net>
escreveu:

> ------------------------------
> *Von:* Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
> *Gesendet:* Montag, 16. August 2021 17:04
> *An:* Hans Buschmann
> *Cc:* pgsql-hackers(at)postgresql(dot)org
> *Betreff:* Re: PG14: Avoid checking output-buffer-length for every
> encoded byte during pg_hex_encode
>
> Hello Ranier,
>
> Thank you for your quick response.
>
> >Is always good to remove immutables from loops, if safe and secure.
>
> I think it's worse because a loop changed variable is involved in the
> test, so it seems to be not immutable.
>
> >Decode has regression too.
>
> good catch, I overlooked it.
>
> >I think that we can take one more step here.
> >pg_hex_decode can be improved too.
>
> +1
>
> By looking a bit more precisely I detected the same checks in
> common/base64.c also involving loop-changed variables. Could you please
> make the same changes to base64.c?
>
I will take a look.

>
> >We can remove limitations from all functions that use hex functions.
> >byteaout from (varlena.c) has an artificial limitation to handle only
> MaxAllocSize length, but
> >nothing prevents her from being prepared for that limitation to be
> removed one day.
>
> +1, but I don't know all implications of the type change to size_t
>
uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can
handle 1GB.

>
> >Please, take a look at the new version attached.
>
> Two remarks for decoding (by eyeball inspection of patch file only
> because of still struggling with git etc.):
>
> 1. The odd-number-of-digits-check for decoding is still part of the loop.
> It should be before the loop and called only once (by testing for an even
> number of srclen)
> So we can eliminate the block if (s == srcend) {} inside the loop!
>
I'm afraid that is not possible.
I tested some variations for this test and regress fails at strings tests.
Anyway for test purposes, I changed it temporarily in this version, but I'm
afraid we'll have to go back.

> 2. I noticed that the error messages for hex_decode should be changed as
>
> s/encoding/decoding/
>
> (as in pg_log_fatal("overflow of destination buffer in hex encoding");)
>
Ok. Changed.

> >If possible can you review the tests if there is an overflow at
> >pg_hex_encode and pg_hex_decode functions?
>
> I would prefer to wait for another patch version from you (my development
> troubles), perhaps also dealing with base64.c
>
base64.c can be made in another patch.

> I don't know how and where any call to the encoding/decoding functions
> creates an overflow situation in normal operation.
>

> I will nonceless try the tests but I don't have any experience with it.
>
Thanks.

> BTW the root cause for my work is an attempt to vastly accelerate these
> functions (hex and base64 encode/decode), but this is left for another day
> (not finished yet) and certainly only on master/new development.
>
I think this can speed up a little.

> Lateron support on this topic would be highly appreciated...
>
If I can help, count on me.

regards,
Ranier Vilela

Attachment Content-Type Size
0001-improve-hex-functions-handling-v1.patch application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2021-08-16 17:13:55 Re: Autovacuum on partitioned table (autoanalyze)
Previous Message Jacob Champion 2021-08-16 17:04:32 Re: badly calculated width of emoji in psql