Re: What is the maximum encoding-conversion growth rate, anyway?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: What is the maximum encoding-conversion growth rate, anyway?
Date: 2007-05-29 02:23:42
Message-ID: 24469.1180405422@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> I'm afraid we have to mke it larger, rather than smaller for 8.3. For
> example 0x82f5 in SHIFT_JIS_2004 (new in 8.3) becomes *pair* of 3
> bytes UTF_8 (0x00e3818b and 0x00e3829a). See
> util/mb/Unicode/shift_jis_2004_to_utf8_combined.map for more details.

> So the worst case is now 6, rather than 3.

Yipes.

> Can we add a column to pg_conversion which represents the "growth
> rate"? This would reduce the rate for most encodings much smaller than
> 6.

We need to do something, but the pg_conversion catalog seems a bad place
to put the info --- don't we have places that need to be able to do
conversion without catalog access?

Perhaps better would be to redefine the API for the conversion functions
so that they palloc their own result space. Then each conversion
function would have to know the maximum growth rate for its particular
conversion. This change would also make it feasible for a conversion
function to prescan the data and determine an exact output size, if that
seemed worthwhile because the potential growth rate was too extreme.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-05-29 02:30:38 Re: CREATE TABLE LIKE INCLUDING INDEXES support
Previous Message Greg Smith 2007-05-29 02:21:54 Re: Logging checkpoints and other slowdown causes