Re: Moving other hex functions to /common

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Moving other hex functions to /common
Date: 2021-01-05 03:47:39
Message-ID: 20210105034739.GG7432@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 2, 2021 at 02:25:33PM +0900, Michael Paquier wrote:
> > Let me get my patch building on the cfbot and then I will address each
> > of these. I am trying to do one stage at a time since I am still
> > learning the process. Thanks.
>
> No problem. On my end, this stuff has been itching me for a couple of
> days and I could not recall why.. Until I remembered that the design
> of the hex APIs in your patch is weak with overflow handling because
> we don't pass down to the function the size of the destination buffer.
> We have finished with a similar set of issues on the past with SCRAM
> and base64, with has led to CVE-2019-10164 and the improvements done
> in cfc40d3. So I think that we need to improve things in a safer way.
> Mapping with the design for base64, I have finished with the attached
> patch, and the following set:
> +extern int64 pg_hex_decode(const char *src, int64 len, char *dst, int64 dstlen);
> +extern int64 pg_hex_encode(const char *src, int64 len, char *dst, int64 dstlen);
> +extern int64 pg_hex_enc_len(int64 srclen);
> +extern int64 pg_hex_dec_len(int64 srclen);

I can see the value of passing the destination length to the hex
functions, and I think you have to pass the src length to pg_hex_encode
since the input can be binary. I assume the pg_hex_decode doesn't need
the source length because it is a null-terminated C string, right?
However, pg_base64_decode(const char *src, size_t len, char *dst) does
pass in the src length, so I can see we should just make it the same. I
also agree it is unclear if 'len' is the src or dst len, so your patch
fixes that with the names. Also, is there a reason you use int64
instead of the size_t used by bytea?

> This ensures that the result never overflows, which explains the
> introduction of an error code for the encoding part, and does not
> use elog()/pg_log() so as external libraries can use them. ECPG uses
> long variables in a couple of places, explaining why it feels safer to
> use int64. int should give enough room to any caller of those APIs,
> but there is no drawback in using a 64-bit API either, and I don't
> think it is worth the risk to break ECPG either for long handling,
> even if I don't believe either that folks are going to work on strings
> larger than 2GB.

Might as well just have hex match what bytea and esc does.

> Things get trickier for the bytea input/output because we want more
> generic error messages depending for invalid values or an incorrect
> number of digits, which is why I have left the copies in encode.c.
> This design could be easily extended with more types of error codes,
> though I am not sure if that's worth bothering.

I don't think removing the two error type reporting from src/common, and
then having to add it back into adt/encode.c makes sense. There are
two, soon three, that want those two error reports, so I am thinking we
are best to just leave ecpg alone since it is just a library that
doesn't want more than one error reporting. I don't think we will have
many more library call cases.

> Even with that, this leads to much more sanity for hex buffer
> manipulation in backup manifests (I don't think that using
> PG_SHAXXX_DIGEST_STRING_LENGTH is a good idea either, I'd like to get
> rid of it in the long-term) and ECPG, so that's clearly a gain.
>
> I don't have a Windows host at hand, though I think that it should
> work there correctly. What do you think about the ideas in the
> attached patch?

I think your patch has a few mistakes. First, I think you removed the
hex_decode files from the file system, rather than removing it via git,
so the diff didn't apply in the cfbot. Second, I don't think ecpg ever
used libpgcommon before. For non-Windows, libpgcommon got referenced
anyway in the build, so non-Windows compiles worked, but in Windows,
that was not referenced, meaning the cfbot failed on ecpglib. I have
modified the attached patch with both of these fixed --- let's see how
it likes this version.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Attachment Content-Type Size
hex2.diff.gz application/gzip 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-01-05 03:54:02 Re: macOS SIP, next try
Previous Message Peter Geoghegan 2021-01-05 03:46:41 Re: pg_waldump/heapdesc.c and struct field names