Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Quan Zongliang <quanzongliang(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: bugfix: invalid bit/varbit input causes the log file to be unreadable
Date: 2020-06-26 16:45:43
Message-ID: 19646.1593189943@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Quan Zongliang <quanzongliang(at)gmail(dot)com> writes:
> The reason is that the error message in the bit_in / varbit_in function
> is output directly using %c. Causes the log file to not be decoded
> correctly.

> The attachment is a patch.

I'm really quite skeptical of the premise here. We do not guarantee that
the postmaster log file is valid in any particular encoding; it'd be
nearly impossible to do so if the cluster contains databases using
different encodings. So I think you'd be way better off to reformulate
your log-reading code to be less fragile.

Even granting the premise, the proposed patch seems like a significant
decrease in user-friendliness for typical cases. I'd rather see us
make an effort to print one valid-per-the-DB-encoding character.
Now that we can rely on snprintf to count %s restrictions in bytes,
I think something like this should work:

errmsg("\"%.*s\" is not a valid binary digit",
pg_mblen(sp), sp)));

But the real problem is that this is only the tip of the iceberg.
You didn't even hit all the %c usages in varbit.c. A quick grep finds
these other spots that can doubtless be made to do the same thing:

acl.c:899: elog(ERROR, "unrecognized objtype abbreviation: %c", objtypec);
arrayfuncs.c:507: errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:554: errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:584: errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:591: errdetail("Unmatched \"%c\" character.", '}')));
arrayfuncs.c:633: errdetail("Unexpected \"%c\" character.",
encode.c:184: errmsg("invalid hexadecimal digit: \"%c\"", c)));
encode.c:341: errmsg("invalid symbol \"%c\" while decoding base64 sequence", (int) c)));
formatting.c:3298: errmsg("unmatched format separator \"%c\"",
jsonpath_gram.c:2390: errdetail("unrecognized flag character \"%c\" in LIKE_REGEX predicate",
regexp.c:426: errmsg("invalid regular expression option: \"%c\"",
tsvector_op.c:312: elog(ERROR, "unrecognized weight: %c", char_weight);
tsvector_op.c:872: errmsg("unrecognized weight: \"%c\"", char_weight)));
varbit.c:233: errmsg("\"%c\" is not a valid binary digit",
varbit.c:258: errmsg("\"%c\" is not a valid hexadecimal digit",
varbit.c:534: errmsg("\"%c\" is not a valid binary digit",
varbit.c:559: errmsg("\"%c\" is not a valid hexadecimal digit",
varlena.c:5589: errmsg("unrecognized format() type specifier \"%c\"",
varlena.c:5710: errmsg("unrecognized format() type specifier \"%c\"",

and that's just in src/backend/utils/adt/.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-06-26 16:55:03 Re: PG 13 release notes, first draft
Previous Message Bruce Momjian 2020-06-26 16:37:26 Re: Default setting for enable_hashagg_disk