pgsql: Replace pg_mblen() with bounds-checked versions.

From: Thomas Munro <tmunro(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Replace pg_mblen() with bounds-checked versions.
Date: 2026-02-09 00:07:46
Message-ID: E1vpEoc-001x3U-14@gemulon.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Replace pg_mblen() with bounds-checked versions.

A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:

1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.

2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.

3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().

The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.

Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Co-authored-by: Noah Misch <noah(at)leadboat(dot)com>
Reviewed-by: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)

Branch
------
REL_16_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/d837fb02925561091a70c5a6a74f42da57a022f9

Modified Files
--------------
contrib/btree_gist/btree_utils_var.c | 21 +++--
contrib/dict_xsyn/dict_xsyn.c | 8 +-
contrib/hstore/hstore_io.c | 2 +-
contrib/ltree/lquery_op.c | 4 +-
contrib/ltree/ltree.h | 2 +-
contrib/ltree/ltree_io.c | 16 ++--
contrib/ltree/ltxtquery_io.c | 4 +-
contrib/pageinspect/heapfuncs.c | 2 +-
contrib/pg_trgm/trgm.h | 4 +-
contrib/pg_trgm/trgm_op.c | 48 ++++++----
contrib/pg_trgm/trgm_regexp.c | 21 +++--
contrib/unaccent/unaccent.c | 7 +-
src/backend/catalog/pg_proc.c | 2 +-
src/backend/tsearch/dict_synonym.c | 8 +-
src/backend/tsearch/dict_thesaurus.c | 18 ++--
src/backend/tsearch/regis.c | 37 ++++----
src/backend/tsearch/spell.c | 123 ++++++++++++-------------
src/backend/tsearch/ts_locale.c | 109 ++++++++--------------
src/backend/tsearch/ts_utils.c | 4 +-
src/backend/tsearch/wparser_def.c | 3 +-
src/backend/utils/adt/encode.c | 6 +-
src/backend/utils/adt/formatting.c | 22 ++---
src/backend/utils/adt/jsonfuncs.c | 2 +-
src/backend/utils/adt/jsonpath_gram.y | 3 +-
src/backend/utils/adt/levenshtein.c | 14 +--
src/backend/utils/adt/like.c | 18 ++--
src/backend/utils/adt/like_match.c | 3 +-
src/backend/utils/adt/oracle_compat.c | 33 ++++---
src/backend/utils/adt/regexp.c | 9 +-
src/backend/utils/adt/tsquery.c | 25 +++---
src/backend/utils/adt/tsvector.c | 11 +--
src/backend/utils/adt/tsvector_op.c | 10 ++-
src/backend/utils/adt/tsvector_parser.c | 29 +++---
src/backend/utils/adt/varbit.c | 8 +-
src/backend/utils/adt/varlena.c | 34 ++++---
src/backend/utils/adt/xml.c | 11 ++-
src/backend/utils/mb/mbutils.c | 150 +++++++++++++++++++++++++++++--
src/include/mb/pg_wchar.h | 7 ++
src/include/tsearch/ts_locale.h | 36 ++++++--
src/include/tsearch/ts_utils.h | 14 ++-
src/test/modules/test_regex/test_regex.c | 3 +-
41 files changed, 532 insertions(+), 359 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2026-02-09 00:07:59 pgsql: Fix mb2wchar functions on short input.
Previous Message Thomas Munro 2026-02-09 00:07:34 pgsql: Code coverage for most pg_mblen* calls.