Skip site navigation (1) Skip section navigation (2)

Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: pgman(at)candle(dot)pha(dot)pa(dot)us
Cc: ishii(at)sraoss(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il,pgsql-patches(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Date: 2005-12-23 01:59:06
Message-ID: 20051223.105906.127176963.t-ishii@sraoss.co.jp (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-patches
Ok, I will commit the patches.

BTW, the example code sequence (ISO-8859-8) Sagi posted seems to have
wrong one.

select '';
WARNING:  ignoring unconvertible ISO_8859_8 character 0x00d7
:
:

0x00d7(\327) is not listed in our ISO-8858-8/UTF-8 conversion map. Is
this OK or do we need to add the conversion for the code?
What do you think, Sega?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> That is a nice solution --- instead of listing all the encodings, you
> listed just the ones that need to be used.  The list is shorter and
> clearer.  It seems like the right approach.  Thanks.
> 
> ---------------------------------------------------------------------------
> 
> Tatsuo Ishii wrote:
> > > Tom Lane wrote:
> > > > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > > > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > > > >> fix the tables that are affected by this.
> > > > 
> > > > > I will look into this.
> > > > 
> > > > Thank you.  It might be worth adding a comment to pg_wchar.h listing all
> > > > the places that need to be fixed when enum pg_enc changes.
> > > > 
> > > 
> > > I have developed the following patch against CVS.  Tatsuo, you can use
> > > it as a starting point.  It adds a comment to encnames.c and reorders
> > > utf8_and_iso8859.c to match the existing order.  I also added the
> > > missing entries at the bottom.  I checked for pg_conv_map in the source
> > > code and only utf8_and_iso8859.c has that structure, so I assume it is
> > > the only one that also depends on the encnames.c ordering.
> > 
> > I think the current implementaion in utf8_and_iso8859.c is fast but
> > too fragile against rearranging of encoding id. I modify those functions
> > in utf8_and_iso8859.c to do a linear search with encoding id. With
> > this change developers feel free to rearrange encoding id, and this
> > kind of problems will be gone forever. The only penalty is the time of
> > searching 13 entries in the encoding map. We can do a quick sort but
> > it will need sorted entry by encoding id and may cause similar problem
> > in the future. So I'm not sure it's worth doing the quick sort.
> > 
> > Propsed patch attached.
> > 
> > > Looking at 8.0.X, it has the matching order, so we are OK there, but it
> > > doesn't have the trailing entries.  Tatsuo, are those needed?
> > 
> > I think it's OK, since the last missing entry will never be visited.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> 
> > Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
> > ===================================================================
> > RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
> > retrieving revision 1.16
> > diff -u -r1.16 utf8_and_iso8859.c
> > --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c	22 Nov 2005 18:17:26 -0000	1.16
> > +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c	23 Dec 2005 01:43:38 -0000
> > @@ -68,15 +68,6 @@
> >  } pg_conv_map;
> >  
> >  static pg_conv_map maps[] = {
> > -	{PG_SQL_ASCII},				/* SQL/ASCII */
> > -	{PG_EUC_JP},				/* EUC for Japanese */
> > -	{PG_EUC_CN},				/* EUC for Chinese */
> > -	{PG_EUC_KR},				/* EUC for Korean */
> > -	{PG_EUC_TW},				/* EUC for Taiwan */
> > -	{PG_JOHAB},					/* EUC for Korean JOHAB */
> > -	{PG_UTF8},					/* Unicode UTF8 */
> > -	{PG_MULE_INTERNAL},			/* Mule internal code */
> > -	{PG_LATIN1},				/* ISO-8859-1 Latin 1 */
> >  	{PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
> >  		sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
> >  	sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)},	/* ISO-8859-2 Latin 2 */
> > @@ -104,12 +95,6 @@
> >  	{PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
> >  		sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
> >  	sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */
> > -	{PG_WIN1256},				/* windows-1256 */
> > -	{PG_WIN1258},				/* Windows-1258 */
> > -	{PG_WIN874},				/* windows-874 */
> > -	{PG_KOI8R},					/* KOI8-R */
> > -	{PG_WIN1251},				/* windows-1251 */
> > -	{PG_WIN866},				/* (MS-DOS CP866) */
> >  	{PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
> >  		sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
> >  	sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)},	/* ISO-8859-5 */
> > @@ -131,11 +116,23 @@
> >  	unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> >  	unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> >  	int			len = PG_GETARG_INT32(4);
> > +	int	i;
> >  
> >  	Assert(PG_GETARG_INT32(1) == PG_UTF8);
> >  	Assert(len >= 0);
> >  
> > -	LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len);
> > +	for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> > +	{
> > +		if (encoding == maps[i].encoding)
> > +		{
> > +			LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len);
> > +			PG_RETURN_VOID();
> > +		}
> > +	}
> > +
> > +	ereport(ERROR,
> > +			(errcode(ERRCODE_INTERNAL_ERROR),
> > +			 errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
> >  
> >  	PG_RETURN_VOID();
> >  }
> > @@ -147,11 +144,23 @@
> >  	unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> >  	unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> >  	int			len = PG_GETARG_INT32(4);
> > +	int	i;
> >  
> >  	Assert(PG_GETARG_INT32(0) == PG_UTF8);
> >  	Assert(len >= 0);
> >  
> > -	UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len);
> > +	for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> > +	{
> > +		if (encoding == maps[i].encoding)
> > +		{
> > +			UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len);
> > +			PG_RETURN_VOID();
> > +		}
> > +	}
> > +
> > +	ereport(ERROR,
> > +			(errcode(ERRCODE_INTERNAL_ERROR),
> > +			 errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
> >  
> >  	PG_RETURN_VOID();
> >  }
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 

In response to

Responses

pgsql-bugs by date

Next:From: Tom LaneDate: 2005-12-23 04:38:26
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Previous:From: Bruce MomjianDate: 2005-12-23 01:53:09
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion

pgsql-patches by date

Next:From: Bruce MomjianDate: 2005-12-23 02:04:40
Subject: Re: Disparity in search_path SHOW and SET
Previous:From: Bruce MomjianDate: 2005-12-23 01:53:09
Subject: Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group