| From: | Ewan Young <kdbase(dot)hack(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz> |
| Subject: | Re: Use ereport() instead of elog() for invalid weights in setweight() |
| Date: | 2026-06-04 02:32:45 |
| Message-ID: | CAON2xHOdAn73qVy6eBGnRHsfKG_70r=uTB_aTp4undkBpTJVQA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Ewan Young <kdbase(dot)hack(at)gmail(dot)com> writes:
> > I noticed that setweight() reports an internal error (SQLSTATE XX000)
> > when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> > the weight comes directly from user input. The two-argument variant
> > also prints the weight as a raw ASCII code, which is a bit unfriendly:
>
> > =# SELECT setweight('cat:1'::tsvector, 'p');
> > ERROR: unrecognized weight: 112
>
> I agree that these ought to be ereport()s. However, I suspect that
> the reason for printing bogus weights numerically was to avoid the
> risk of generating encoding-incorrect strings if the given char
> value has its high bit set. The existing code in tsvector_filter
> is failing to consider that hazard.
Ah, I hadn't considered that. You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message. The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.
>
> I experimented with making the error messages print non-ASCII
> characters differently, and soon decided that that added enough
> complexity that we shouldn't have three copies of it. So the
> attached proposed v2 also factors the code out into a new
> function parse_weight (maybe a different name would be better?).
Factoring it out looks like a clear improvement. parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:
=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: 22023: unrecognized weight: "p"
LOCATION: parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR: 22023: unrecognized weight: "\200"
LOCATION: parse_weight, tsvector_op.c:240
>
> I'm unconvinced that we really need a regression test case for
> this ...
Agreed, no objection to dropping it. The behavior worth checking is
the message formatting, which is easy enough to verify by hand.
>
> regards, tom lane
>
So v2 looks good to me. Thanks for improving the patch!
Best regards,
Ewan Young
| From | Date | Subject | |
|---|---|---|---|
| Next Message | cca5507 | 2026-06-04 02:43:47 | Re: Use streaming read I/O when enabling data checksums online |
| Previous Message | Bruce Momjian | 2026-06-04 02:27:01 | Re: First draft of PG 19 release notes |