Re: What is the maximum encoding-conversion growth rate, anyway?

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is the maximum encoding-conversion growth rate, anyway?
Date: 2007-07-18 15:09:10
Message-ID: 200707181509.l6IF9AE12790@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> The conclusion of the discussion appears that we could reduce
> MAX_CONVERSION_GROWTH from 4 to 3 safely with all existing built-in
> conversions.
>
> However, since user defined conversions could set arbitrary growth
> rate, probably it would be better leave it as it is now.
>
> For 8.4, maybe we could change conversion function's signature so that
> we don't need to have the fixed conversion rate as Tom suggested.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
> > Where are we on this?
> >
> > ---------------------------------------------------------------------------
> >
> > Tom Lane wrote:
> > > I just rearranged the code in mbutils.c a little bit to make it more
> > > robust if conversion of an over-length string is attempted, and noted
> > > this comment:
> > >
> > > /*
> > > * When converting strings between different encodings, we assume that space
> > > * for converted result is 4-to-1 growth in the worst case. The rate for
> > > * currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
> > > * kanna -> UTF8 is the worst case). So "4" should be enough for the moment.
> > > *
> > > * Note that this is not the same as the maximum character width in any
> > > * particular encoding.
> > > */
> > > #define MAX_CONVERSION_GROWTH 4
> > >
> > > It strikes me that this is overly pessimistic, since we do not support
> > > 5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
> > > in any supported encoding that require 4 bytes in another. Could we
> > > reduce the multiplier to 3? Or even 2? This has a direct impact on the
> > > longest COPY lines we can support, so I'd like it not to be larger than
> > > necessary.
> > >
> > > regards, tom lane
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Have you searched our list archives?
> > >
> > > http://archives.postgresql.org
> >
> > --
> > Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> > EnterpriseDB http://www.enterprisedb.com
> >
> > + If your life is a hard drive, Christ can be your backup. +

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-07-18 15:13:28 Re: Comments on the HOT design
Previous Message Magnus Hagander 2007-07-18 15:04:09 Re: Future of krb5 authentication