pgsql: Avoid doing encoding conversions by double-conversion via MULE_I

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Avoid doing encoding conversions by double-conversion via MULE_I
Date: 2015-11-28 18:42:41
Message-ID: E1a2kSH-0002H7-Uy@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Avoid doing encoding conversions by double-conversion via MULE_INTERNAL.

Previously, we did many conversions for Cyrillic and Central European
single-byte encodings by converting to a related MULE_INTERNAL coding
scheme before converting to the destination. This seems unnecessarily
inefficient. Moreover, if the conversion encounters an untranslatable
character, the error message will confusingly complain about failure
to convert to or from MULE_INTERNAL, rather than the user-visible
encodings. Worse still, this approach results in some completely
unnecessary conversion failures; there are cases where the chosen
MULE subset lacks characters that exist in both of the user-visible
encodings, causing a conversion failure that need not occur.

This patch fixes the first two of those deficiencies by introducing
a new local2local() conversion support subroutine for direct conversion
between any two single-byte character sets, and adding new conversion
tables where needed. However, I generated the new conversion tables by
testing PG 9.5's behavior, so that the actual conversion behavior is
bug-compatible with previous releases; the only user-visible behavior
change is that the error messages for conversion failures are saner.
Changes in the conversion behavior will probably ensue after discussion.

Interestingly, although this approach requires more tables, the .so files
actually end up smaller (at least on my x86_64 machine); the tables are
smaller than the management code needed for double conversion.

Per a complaint from Albe Laurenz.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/8d32717b6bfaeda5b88b338dae728b47da19f4bb

Modified Files
--------------
src/backend/utils/mb/conv.c | 55 +-
.../cyrillic_and_mic/cyrillic_and_mic.c | 540 ++++++++++----------
.../latin2_and_win1250/latin2_and_win1250.c | 136 ++---
.../conversion_procs/latin_and_mic/latin_and_mic.c | 54 +-
src/include/mb/pg_wchar.h | 2 +
5 files changed, 376 insertions(+), 411 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2015-11-28 18:55:09 Re: pgsql: Fix broken multibyte regression tests.
Previous Message Tom Lane 2015-11-27 22:31:41 pgsql: Update UCS_to_GB18030.pl with info about origin of the reference