Re: Moving other hex functions to /common

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Moving other hex functions to /common
Date: 2021-01-02 05:25:33
Message-ID: X/ADzYzkP0ExrBq1@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 01, 2021 at 03:06:13PM -0500, Bruce Momjian wrote:
> Thanks. I had to learn how to squash my commits into a new branch and
> then generate a format-patch on that:
>
> https://bugsdb.com/_en/debug/8b648ec395b86be32efa9629cb006d74
>
> I wanted to see how the cfbot liked my original patch with the renames,
> and it didn't, so now I know I have to use this method for the
> commitfest. Patch attached.

There are many ways to do that, indeed. On my end, I use a local
branch. and then apply a set of git reset --soft before recreating a
single commit.

> Let me get my patch building on the cfbot and then I will address each
> of these. I am trying to do one stage at a time since I am still
> learning the process. Thanks.

No problem. On my end, this stuff has been itching me for a couple of
days and I could not recall why.. Until I remembered that the design
of the hex APIs in your patch is weak with overflow handling because
we don't pass down to the function the size of the destination buffer.
We have finished with a similar set of issues on the past with SCRAM
and base64, with has led to CVE-2019-10164 and the improvements done
in cfc40d3. So I think that we need to improve things in a safer way.
Mapping with the design for base64, I have finished with the attached
patch, and the following set:
+extern int64 pg_hex_decode(const char *src, int64 len, char *dst, int64 dstlen);
+extern int64 pg_hex_encode(const char *src, int64 len, char *dst, int64 dstlen);
+extern int64 pg_hex_enc_len(int64 srclen);
+extern int64 pg_hex_dec_len(int64 srclen);

This ensures that the result never overflows, which explains the
introduction of an error code for the encoding part, and does not
use elog()/pg_log() so as external libraries can use them. ECPG uses
long variables in a couple of places, explaining why it feels safer to
use int64. int should give enough room to any caller of those APIs,
but there is no drawback in using a 64-bit API either, and I don't
think it is worth the risk to break ECPG either for long handling,
even if I don't believe either that folks are going to work on strings
larger than 2GB.

Things get trickier for the bytea input/output because we want more
generic error messages depending for invalid values or an incorrect
number of digits, which is why I have left the copies in encode.c.
This design could be easily extended with more types of error codes,
though I am not sure if that's worth bothering.

Even with that, this leads to much more sanity for hex buffer
manipulation in backup manifests (I don't think that using
PG_SHAXXX_DIGEST_STRING_LENGTH is a good idea either, I'd like to get
rid of it in the long-term) and ECPG, so that's clearly a gain.

I don't have a Windows host at hand, though I think that it should
work there correctly. What do you think about the ideas in the
attached patch?
--
Michael

Attachment Content-Type Size
0001-Refactor-the-hex-code.patch text/x-diff 22.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-02 05:27:29 Re: Move --data-checksums to common options in initdb --help
Previous Message Bharath Rupireddy 2021-01-02 05:23:40 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit