Quick Links

Re: Inconsistent printf placeholders

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	peter(at)eisentraut(dot)org
Cc:	dgrowleyml(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Inconsistent printf placeholders
Date:	2024-03-21 08:16:32
Message-ID:	20240321.171632.804394642968687545.horikyota.ntt@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for looking this.

At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote in
> On 15.03.24 08:20, Kyotaro Horiguchi wrote:
> > diff --git a/src/backend/access/transam/twophase.c
> > b/src/backend/access/transam/twophase.c
> > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool
> > missing_ok)
> > errmsg("could not read file \"%s\":
> > %m", path)));
> > else
> > ereport(ERROR,
> > - (errmsg("could not read file \"%s\":
> > - read %d of %lld",
> > - path, r, (long long
> > - int) stat.st_size)));
> > + (errmsg("could not read file \"%s\": read %zd of %zu",
> > + path, r, (Size) stat.st_size)));
> > }
> > pgstat_report_wait_end();
>
> This might be worse, because stat.st_size is of type off_t, which
> could be smaller than Size/size_t.

I think you were trying to mention that off_t could be wider than
size_t and you're right in that point. I thought that it is safe since
we are trying to read the whole content of the file at once here into
palloc'ed memory.

However, on second thought, if st_size is out of the range of ssize_t,
and palloc accepts that size, at least on Linux, read(2) reads only
0x7ffff000 bytes and raches the error reporting. Addition to that,
this size was closer to the memory allocation size limit than I had
thought.

As the result, I removed the change. However, I kept the change of the
type of variable "r" and corresponding placeholder %zd.

> > diff --git a/src/backend/libpq/be-secure-gssapi.c
> > b/src/backend/libpq/be-secure-gssapi.c
> > index bc04e78abb..68645b4519 100644
> > --- a/src/backend/libpq/be-secure-gssapi.c
> > +++ b/src/backend/libpq/be-secure-gssapi.c
> > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
> > if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
> > {
> > ereport(COMMERROR,
> > - (errmsg("oversize GSSAPI packet sent
> > - by the client (%zu > %d)",
> > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)",
> > (size_t) input.length,
> > - PQ_GSS_RECV_BUFFER_SIZE)));
> > + (size_t) PQ_GSS_RECV_BUFFER_SIZE)));
> > return -1;
> > }
> >
>
> Might be better to add that cast to the definition of
> PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.

As far as I see, the only exceptional uses I found were a comparison
with int values, and being passed as an OM_uint32 (to one of the
parameters of gss_wrap_size_limit()). Therefore, I agree that it is
beneficial. By the way, we currently define Size as the same as size_t
(since 1998). Is it correct to consider Size as merely for backward
compatibility and we should use size_t for new code? I used size_t in
the modified part in the attached patch.

> > diff --git a/src/backend/replication/repl_gram.y
> > b/src/backend/replication/repl_gram.y
> > index 7474f5bd67..baa76280b9 100644
> > --- a/src/backend/replication/repl_gram.y
> > +++ b/src/backend/replication/repl_gram.y
> > @@ -312,11 +312,6 @@ timeline_history:
> > {
> > TimeLineHistoryCmd *cmd;
> > - if ($2 <= 0)
> > - ereport(ERROR,
> > - (errcode(ERRCODE_SYNTAX_ERROR),
> > - errmsg("invalid
> > - timeline %u",
> > - $2)));
> > -
...
> I don't think this is correct. It loses the check for == 0.

Ugh. It's my mistake. So we need to consider unifying the messages
again. In walsummaryfuncs.c, %lld is required, but it's silly for the
uses in repl_gram.y. Finally, I chose not to change anything here.

> > diff --git a/src/backend/tsearch/to_tsany.c
> > b/src/backend/tsearch/to_tsany.c
> > index 88cba58cba..9d21178107 100644
> > --- a/src/backend/tsearch/to_tsany.c
> > +++ b/src/backend/tsearch/to_tsany.c
> > @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs)
> > if (lenstr > MAXSTRPOS)
> > ereport(ERROR,
> > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> > - errmsg("string is too long for tsvector (%d
> > - bytes, max %d bytes)", lenstr, MAXSTRPOS)));
> > + /* cast values to avoid extra translatable messages */
> > + errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)",
> > (long) lenstr, (long) MAXSTRPOS)));
> > totallen = CALCDATASIZE(prs->curwords, lenstr);
> > in = (TSVector) palloc0(totallen);
>
> I think it would be better instead to change the message in
> tsvectorin() to *not* use long. The size of long is unportable, so I
> would rather avoid using it at all.

The casts to long are tentative only to adjust to the corresponding
placeholder, and in this context, portability concerns are not
applicable. However, those casts are apparently useless. As you
suggested, I tried to change tsvectorin() instead, but there's a
problem here.

tsvector.c:224
> errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)",
> (long) (cur - tmpbuf), (long) MAXSTRPOS)));

cur and tmpbuf are pointers. The byte width of the subtraction results
varies by architecture. However, the surrounding code apparently
assumes that the difference fits within an int. I added a cast to int
for the pointer arithmetic here. (Although I'm not sure this is the
right direction.)

> > diff --git a/src/backend/utils/adt/varlena.c
> > b/src/backend/utils/adt/varlena.c
> > index 8d28dd42ce..5de490b569 100644
> > --- a/src/backend/utils/adt/varlena.c
> > +++ b/src/backend/utils/adt/varlena.c
> > @@ -3217,8 +3217,9 @@ byteaGetByte(PG_FUNCTION_ARGS)
> > if (n < 0 || n >= len)
> > ereport(ERROR,
> > (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
> > - errmsg("index %d out of valid range, 0..%d",
> > - n, len - 1)));
> > + /* cast values to avoid extra translable messages */
> > + errmsg("index %lld out of valid range, 0..%lld",
> > + (long long)n, (long long) len - 1)));
> > byte = ((unsigned char *) VARDATA_ANY(v))[n];
>
> I think this is taking it too far. We shouldn't try to make all
> similar messages use the same placeholders. If the underlying types
> are different, we should use them. Adding more casts makes the code
> less robust overall. The size_t/ssize_t cleanup is different, because
> there the types were arguably wrong to begin with, and by using the
> right types we move toward more consistency.

Ouch! Understood. They treat byte and bit locations accordingly. I
agree that it's too far. Removed.

> > diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
> > index 6f0814d9ac..feb4d5dcf4 100644
> > --- a/src/bin/pg_combinebackup/pg_combinebackup.c
> > +++ b/src/bin/pg_combinebackup/pg_combinebackup.c
> > - pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
> > - filename, rb, (long long int) st.st_size);
> > + /* cast st_size to avoid extra translatable messages */
> > + pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
> > + filename, rb, (size_t) st.st_size);
> > }
> > /* Adjust buffer length for new data and restore trailing-\0 invariant */
>
> Similar to above, casting off_t to size_t is dubious.

The same discussion regarding the change in twophase.c is also
applicable to this change. I applied the same amendment.

> > diff --git a/src/port/user.c b/src/port/user.c
> > index 7444aeb64b..9364bdb69e 100644
> > --- a/src/port/user.c
> > +++ b/src/port/user.c
> > @@ -40,8 +40,8 @@ pg_get_user_name(uid_t user_id, char *buffer, size_t
> > buflen)
> > }
> > if (pwerr != 0)
> > snprintf(buffer, buflen,
> > - _("could not look up local user ID %d: %s"),
> > - (int) user_id,
> > + _("could not look up local user ID %ld: %s"),
> > + (long) user_id,
> > strerror_r(pwerr, pwdbuf, sizeof(pwdbuf)));
> > else
> > snprintf(buffer, buflen,
>
> Also dubious use of "long" here.

Okay, used %d instead. In addition to that, I removed the casts from
uid_t expecting that compilers will detect the change of the
definition of uid_t.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
unify_placeholders_of_some_messages_v4.patch	text/x-patch	16.3 KB

In response to

Re: Inconsistent printf placeholders at 2024-03-19 09:50:23 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	jian he	2024-03-21 08:25:15	Re: SQL:2011 application time
Previous Message	Dominique Devienne	2024-03-21 07:59:29	Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs