pgsql: Add 'noError' argument to encoding conversion functions.

From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Add 'noError' argument to encoding conversion functions.
Date: 2021-04-01 09:25:00
Message-ID: E1lRtZU-0000J1-WB@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Add 'noError' argument to encoding conversion functions.

With the 'noError' argument, you can try to convert a buffer without
knowing the character boundaries beforehand. The functions now need to
return the number of input bytes successfully converted.

This is is a backwards-incompatible change, if you have created a custom
encoding conversion with CREATE CONVERSION. This adds a check to
pg_upgrade for that, refusing the upgrade if there are any user-defined
encoding conversions. Custom conversions are very rare, there are no
commonly used extensions that I know of that uses that feature. No other
objects can depend on conversions, so if you do have one, you can fairly
easily drop it before upgrading, and recreate it after the upgrade with
an updated version.

Add regression tests for built-in encoding conversions. This doesn't cover
every conversion, but it covers all the internal functions in conv.c that
are used to implement the conversions.

Reviewed-by: John Naylor
Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ea1b99a6619cd9dcfd46b82ac0d926b0b80e0ae9

Modified Files
--------------
doc/src/sgml/ref/create_conversion.sgml | 12 +-
src/backend/commands/conversioncmds.c | 32 +-
src/backend/utils/error/elog.c | 2 +
src/backend/utils/mb/conv.c | 139 +++++-
.../cyrillic_and_mic/cyrillic_and_mic.c | 127 +++--
.../euc2004_sjis2004/euc2004_sjis2004.c | 94 +++-
.../euc_cn_and_mic/euc_cn_and_mic.c | 57 ++-
.../euc_jp_and_sjis/euc_jp_and_sjis.c | 153 ++++--
.../euc_kr_and_mic/euc_kr_and_mic.c | 57 ++-
.../euc_tw_and_big5/euc_tw_and_big5.c | 165 +++++--
.../latin2_and_win1250/latin2_and_win1250.c | 49 +-
.../conversion_procs/latin_and_mic/latin_and_mic.c | 43 +-
.../conversion_procs/utf8_and_big5/utf8_and_big5.c | 37 +-
.../utf8_and_cyrillic/utf8_and_cyrillic.c | 67 +--
.../utf8_and_euc2004/utf8_and_euc2004.c | 37 +-
.../utf8_and_euc_cn/utf8_and_euc_cn.c | 37 +-
.../utf8_and_euc_jp/utf8_and_euc_jp.c | 37 +-
.../utf8_and_euc_kr/utf8_and_euc_kr.c | 37 +-
.../utf8_and_euc_tw/utf8_and_euc_tw.c | 37 +-
.../utf8_and_gb18030/utf8_and_gb18030.c | 37 +-
.../conversion_procs/utf8_and_gbk/utf8_and_gbk.c | 37 +-
.../utf8_and_iso8859/utf8_and_iso8859.c | 43 +-
.../utf8_and_iso8859_1/utf8_and_iso8859_1.c | 35 +-
.../utf8_and_johab/utf8_and_johab.c | 37 +-
.../conversion_procs/utf8_and_sjis/utf8_and_sjis.c | 37 +-
.../utf8_and_sjis2004/utf8_and_sjis2004.c | 37 +-
.../conversion_procs/utf8_and_uhc/utf8_and_uhc.c | 37 +-
.../conversion_procs/utf8_and_win/utf8_and_win.c | 43 +-
src/backend/utils/mb/mbutils.c | 79 +++-
src/bin/pg_upgrade/check.c | 95 ++++
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 332 ++++++-------
src/include/mb/pg_wchar.h | 35 +-
src/test/regress/expected/conversion.out | 519 +++++++++++++++++++++
src/test/regress/expected/opr_sanity.out | 7 +-
src/test/regress/input/create_function_1.source | 4 +
src/test/regress/output/create_function_1.source | 3 +
src/test/regress/regress.c | 134 ++++++
src/test/regress/sql/conversion.sql | 185 ++++++++
src/test/regress/sql/opr_sanity.sql | 7 +-
40 files changed, 2333 insertions(+), 631 deletions(-)

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-04-01 09:25:01 pgsql: Do COPY FROM encoding conversion/verification in larger chunks.
Previous Message Peter Eisentraut 2021-04-01 08:05:11 pgsql: Make extract(timetz) tests a bit more interesting