Quick Links

Re: dict_synonym.c: fix truncation of multibyte sequence

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tristan Partin <tristan(at)partin(dot)io>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: dict_synonym.c: fix truncation of multibyte sequence
Date:	2026-06-05 17:37:03
Message-ID:	8cf296c265a367e08bf221781c4ba6c3f3726fda.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 2026-06-05 at 15:57 +0000, Tristan Partin wrote:
> > In any case, the input comes from a trusted
> > source (dictionary configuration), so it's not very serious.
>
> The fix looks and sounds good. Do we have any way to test this, so it
> doesn't regress in the future?

-- Ⱥ is 2 bytes, 'ⱥ' is 3 bytes
$ echo "foo barȺ" > /path/to/postgres/share/tsearch_data/mbtest.syn

CREATE TEXT SEARCH DICTIONARY mb_syn (
TEMPLATE = synonym,
SYNONYMS = mbtest);

SELECT ts_lexize('mb_syn', 'foo');

=# SELECT ts_lexize('mb_syn', 'foo'); -- before patch
ts_lexize
-----------
{bar}
(1 row)

=# SELECT ts_lexize('mb_syn', 'foo'); -- after patch
ts_lexize
-----------
{barⱥ}
(1 row)

It requires a specially-crafted synonym file, and I'm not sure it's
worth much effort to add a test for this specific path. If we see
similar bugs, it's more likely to be somewhere else that makes the same
faulty assumption.

If you do think we should add tests, we should probably add a set of
dictionary-related files (.syn, .dict, .ths, etc.) that contain a
variety of adversarial Unicode cases.

I'd be inclined to just commit this fix for now. It needs backpatching,
and I don't think we want to backpatch a large set of tests with it.

Regards,
Jeff Davis

In response to

Re: dict_synonym.c: fix truncation of multibyte sequence at 2026-06-05 15:57:53 from Tristan Partin

Responses

Re: dict_synonym.c: fix truncation of multibyte sequence at 2026-06-05 20:46:00 from Tristan Partin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Corey Huinker	2026-06-05 17:43:50	Re: postgres_fdw: Emit message when batch_size is reduced
Previous Message	Nathan Bossart	2026-06-05 17:12:04	Re: [PATCH] refint: Avoid reusing cascade UPDATE plans.