Re: What is the maximum encoding-conversion growth rate, anyway?

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: bruce(at)momjian(dot)us
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is the maximum encoding-conversion growth rate, anyway?
Date: 2007-07-18 09:48:24
Message-ID: 20070718.184824.56046464.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The conclusion of the discussion appears that we could reduce
MAX_CONVERSION_GROWTH from 4 to 3 safely with all existing built-in
conversions.

However, since user defined conversions could set arbitrary growth
rate, probably it would be better leave it as it is now.

For 8.4, maybe we could change conversion function's signature so that
we don't need to have the fixed conversion rate as Tom suggested.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Where are we on this?
>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
> > I just rearranged the code in mbutils.c a little bit to make it more
> > robust if conversion of an over-length string is attempted, and noted
> > this comment:
> >
> > /*
> > * When converting strings between different encodings, we assume that space
> > * for converted result is 4-to-1 growth in the worst case. The rate for
> > * currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
> > * kanna -> UTF8 is the worst case). So "4" should be enough for the moment.
> > *
> > * Note that this is not the same as the maximum character width in any
> > * particular encoding.
> > */
> > #define MAX_CONVERSION_GROWTH 4
> >
> > It strikes me that this is overly pessimistic, since we do not support
> > 5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
> > in any supported encoding that require 4 bytes in another. Could we
> > reduce the multiplier to 3? Or even 2? This has a direct impact on the
> > longest COPY lines we can support, so I'd like it not to be larger than
> > necessary.
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-07-18 10:29:26 Re: SSPI authentication
Previous Message Tatsuo Ishii 2007-07-18 09:48:17 Re: What is the maximum encoding-conversion growth rate, anyway?