Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date: 2005-12-23 01:53:09
Message-ID: 200512230153.jBN1r9810536@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-patches


That is a nice solution --- instead of listing all the encodings, you
listed just the ones that need to be used. The list is shorter and
clearer. It seems like the right approach. Thanks.

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> > Tom Lane wrote:
> > > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > > >> fix the tables that are affected by this.
> > >
> > > > I will look into this.
> > >
> > > Thank you. It might be worth adding a comment to pg_wchar.h listing all
> > > the places that need to be fixed when enum pg_enc changes.
> > >
> >
> > I have developed the following patch against CVS. Tatsuo, you can use
> > it as a starting point. It adds a comment to encnames.c and reorders
> > utf8_and_iso8859.c to match the existing order. I also added the
> > missing entries at the bottom. I checked for pg_conv_map in the source
> > code and only utf8_and_iso8859.c has that structure, so I assume it is
> > the only one that also depends on the encnames.c ordering.
>
> I think the current implementaion in utf8_and_iso8859.c is fast but
> too fragile against rearranging of encoding id. I modify those functions
> in utf8_and_iso8859.c to do a linear search with encoding id. With
> this change developers feel free to rearrange encoding id, and this
> kind of problems will be gone forever. The only penalty is the time of
> searching 13 entries in the encoding map. We can do a quick sort but
> it will need sorted entry by encoding id and may cause similar problem
> in the future. So I'm not sure it's worth doing the quick sort.
>
> Propsed patch attached.
>
> > Looking at 8.0.X, it has the matching order, so we are OK there, but it
> > doesn't have the trailing entries. Tatsuo, are those needed?
>
> I think it's OK, since the last missing entry will never be visited.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan

> Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
> retrieving revision 1.16
> diff -u -r1.16 utf8_and_iso8859.c
> --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 22 Nov 2005 18:17:26 -0000 1.16
> +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 23 Dec 2005 01:43:38 -0000
> @@ -68,15 +68,6 @@
> } pg_conv_map;
>
> static pg_conv_map maps[] = {
> - {PG_SQL_ASCII}, /* SQL/ASCII */
> - {PG_EUC_JP}, /* EUC for Japanese */
> - {PG_EUC_CN}, /* EUC for Chinese */
> - {PG_EUC_KR}, /* EUC for Korean */
> - {PG_EUC_TW}, /* EUC for Taiwan */
> - {PG_JOHAB}, /* EUC for Korean JOHAB */
> - {PG_UTF8}, /* Unicode UTF8 */
> - {PG_MULE_INTERNAL}, /* Mule internal code */
> - {PG_LATIN1}, /* ISO-8859-1 Latin 1 */
> {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
> sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)}, /* ISO-8859-2 Latin 2 */
> @@ -104,12 +95,6 @@
> {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
> sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */
> - {PG_WIN1256}, /* windows-1256 */
> - {PG_WIN1258}, /* Windows-1258 */
> - {PG_WIN874}, /* windows-874 */
> - {PG_KOI8R}, /* KOI8-R */
> - {PG_WIN1251}, /* windows-1251 */
> - {PG_WIN866}, /* (MS-DOS CP866) */
> {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
> sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)}, /* ISO-8859-5 */
> @@ -131,11 +116,23 @@
> unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> int len = PG_GETARG_INT32(4);
> + int i;
>
> Assert(PG_GETARG_INT32(1) == PG_UTF8);
> Assert(len >= 0);
>
> - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len);
> + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> + {
> + if (encoding == maps[i].encoding)
> + {
> + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len);
> + PG_RETURN_VOID();
> + }
> + }
> +
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
>
> PG_RETURN_VOID();
> }
> @@ -147,11 +144,23 @@
> unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> int len = PG_GETARG_INT32(4);
> + int i;
>
> Assert(PG_GETARG_INT32(0) == PG_UTF8);
> Assert(len >= 0);
>
> - UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len);
> + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> + {
> + if (encoding == maps[i].encoding)
> + {
> + UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len);
> + PG_RETURN_VOID();
> + }
> + }
> +
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
>
> PG_RETURN_VOID();
> }

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tatsuo Ishii 2005-12-23 01:59:06 Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Previous Message Tatsuo Ishii 2005-12-23 01:42:07 Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding

Browse pgsql-patches by date

  From Date Subject
Next Message Tatsuo Ishii 2005-12-23 01:59:06 Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Previous Message Tatsuo Ishii 2005-12-23 01:42:07 Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding