| From: | "Tristan Partin" <tristan(at)partin(dot)io> |
|---|---|
| To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
| Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: dict_synonym.c: fix truncation of multibyte sequence |
| Date: | 2026-06-05 20:46:00 |
| Message-ID: | DJ1ERDAO9GGX.3TNILXYMEE8KO@partin.io |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri Jun 5, 2026 at 5:37 PM UTC, Jeff Davis wrote:
> On Fri, 2026-06-05 at 15:57 +0000, Tristan Partin wrote:
>> > In any case, the input comes from a trusted
>> > source (dictionary configuration), so it's not very serious.
>>
>> The fix looks and sounds good. Do we have any way to test this, so it
>> doesn't regress in the future?
>
> -- Ⱥ is 2 bytes, 'ⱥ' is 3 bytes
> $ echo "foo barȺ" > /path/to/postgres/share/tsearch_data/mbtest.syn
>
> CREATE TEXT SEARCH DICTIONARY mb_syn (
> TEMPLATE = synonym,
> SYNONYMS = mbtest);
>
> SELECT ts_lexize('mb_syn', 'foo');
>
> =# SELECT ts_lexize('mb_syn', 'foo'); -- before patch
> ts_lexize
> -----------
> {bar}
> (1 row)
>
> =# SELECT ts_lexize('mb_syn', 'foo'); -- after patch
> ts_lexize
> -----------
> {barⱥ}
> (1 row)
>
> It requires a specially-crafted synonym file, and I'm not sure it's
> worth much effort to add a test for this specific path. If we see
> similar bugs, it's more likely to be somewhere else that makes the same
> faulty assumption.
>
> If you do think we should add tests, we should probably add a set of
> dictionary-related files (.syn, .dict, .ths, etc.) that contain a
> variety of adversarial Unicode cases.
>
> I'd be inclined to just commit this fix for now. It needs backpatching,
> and I don't think we want to backpatch a large set of tests with it.
I would say proceed as you see fit. I guess I am generally of the
opinion that additional testing is generally always better, but I don't
want to push for something if others don't see the same value. I'd be
happy to provide a patch for the test in a subsequent discussion if that
is a good middle ground?
--
Tristan Partin
PostgreSQL Contributors Team
AWS (https://aws.amazon.com)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tristan Partin | 2026-06-05 21:04:34 | Re: updates for handling optional argument in system functions |
| Previous Message | Tristan Partin | 2026-06-05 20:34:01 | Re: Prevent remote libpq notices from being sent to clients |