Quick Links

Re: What is the maximum encoding-conversion growth rate, anyway?

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What is the maximum encoding-conversion growth rate, anyway?
Date:	2007-05-29 00:19:18
Message-ID:	20070529.091918.59670400.t-ishii@sraoss.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> I just rearranged the code in mbutils.c a little bit to make it more
> robust if conversion of an over-length string is attempted, and noted
> this comment:
>
> /*
> * When converting strings between different encodings, we assume that space
> * for converted result is 4-to-1 growth in the worst case. The rate for
> * currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
> * kanna -> UTF8 is the worst case). So "4" should be enough for the moment.
> *
> * Note that this is not the same as the maximum character width in any
> * particular encoding.
> */
> #define MAX_CONVERSION_GROWTH 4
>
> It strikes me that this is overly pessimistic, since we do not support
> 5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
> in any supported encoding that require 4 bytes in another. Could we
> reduce the multiplier to 3? Or even 2? This has a direct impact on the
> longest COPY lines we can support, so I'd like it not to be larger than
> necessary.

I'm afraid we have to mke it larger, rather than smaller for 8.3. For
example 0x82f5 in SHIFT_JIS_2004 (new in 8.3) becomes *pair* of 3
bytes UTF_8 (0x00e3818b and 0x00e3829a). See
util/mb/Unicode/shift_jis_2004_to_utf8_combined.map for more details.

So the worst case is now 6, rather than 3.

Can we add a column to pg_conversion which represents the "growth
rate"? This would reduce the rate for most encodings much smaller than
6.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

What is the maximum encoding-conversion growth rate, anyway? at 2007-05-28 16:53:49 from Tom Lane

Responses

Re: What is the maximum encoding-conversion growth rate, anyway? at 2007-05-29 02:23:42 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2007-05-29 00:35:34	Re: CREATE TABLE LIKE INCLUDING INDEXES support
Previous Message	Bruce Momjian	2007-05-29 00:18:55	TOAST usage setting