From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | What is the maximum encoding-conversion growth rate, anyway? |
Date: | 2007-05-28 16:53:49 |
Message-ID: | 29182.1180371229@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I just rearranged the code in mbutils.c a little bit to make it more
robust if conversion of an over-length string is attempted, and noted
this comment:
/*
* When converting strings between different encodings, we assume that space
* for converted result is 4-to-1 growth in the worst case. The rate for
* currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
* kanna -> UTF8 is the worst case). So "4" should be enough for the moment.
*
* Note that this is not the same as the maximum character width in any
* particular encoding.
*/
#define MAX_CONVERSION_GROWTH 4
It strikes me that this is overly pessimistic, since we do not support
5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
in any supported encoding that require 4 bytes in another. Could we
reduce the multiplier to 3? Or even 2? This has a direct impact on the
longest COPY lines we can support, so I'd like it not to be larger than
necessary.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Conway | 2007-05-28 18:12:37 | libedit-preferred by default |
Previous Message | Oleg Bartunov | 2007-05-28 13:30:44 | Re: Why not keeping positions in GIN? |