Re: Use ereport() instead of elog() for invalid weights in setweight()

From: Ewan Young <kdbase(dot)hack(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Use ereport() instead of elog() for invalid weights in setweight()
Date: 2026-06-04 02:32:45
Message-ID: CAON2xHOdAn73qVy6eBGnRHsfKG_70r=uTB_aTp4undkBpTJVQA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Ewan Young <kdbase(dot)hack(at)gmail(dot)com> writes:
> > I noticed that setweight() reports an internal error (SQLSTATE XX000)
> > when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> > the weight comes directly from user input. The two-argument variant
> > also prints the weight as a raw ASCII code, which is a bit unfriendly:
>
> > =# SELECT setweight('cat:1'::tsvector, 'p');
> > ERROR: unrecognized weight: 112
>
> I agree that these ought to be ereport()s. However, I suspect that
> the reason for printing bogus weights numerically was to avoid the
> risk of generating encoding-incorrect strings if the given char
> value has its high bit set. The existing code in tsvector_filter
> is failing to consider that hazard.

Ah, I hadn't considered that. You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message. The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.

>
> I experimented with making the error messages print non-ASCII
> characters differently, and soon decided that that added enough
> complexity that we shouldn't have three copies of it. So the
> attached proposed v2 also factors the code out into a new
> function parse_weight (maybe a different name would be better?).

Factoring it out looks like a clear improvement. parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:

=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: 22023: unrecognized weight: "p"
LOCATION: parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR: 22023: unrecognized weight: "\200"
LOCATION: parse_weight, tsvector_op.c:240

>
> I'm unconvinced that we really need a regression test case for
> this ...

Agreed, no objection to dropping it. The behavior worth checking is
the message formatting, which is easy enough to verify by hand.

>
> regards, tom lane
>

So v2 looks good to me. Thanks for improving the patch!

Best regards,
Ewan Young

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message cca5507 2026-06-04 02:43:47 Re: Use streaming read I/O when enabling data checksums online
Previous Message Bruce Momjian 2026-06-04 02:27:01 Re: First draft of PG 19 release notes