Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] fix for multi-byte partial truncating

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: t-ishii(at)sra(dot)co(dot)jp
Cc: hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] fix for multi-byte partial truncating
Date: 1998-09-25 01:47:15
Message-ID: 199809250147.VAA22725@candle.pha.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Applied, but for some reason patch did not like the normal cvs/rcs diff
format.  Not sure why.  Please check to see it is OK.  Looks OK here.


> For varchar(n)/char(n) type, input string is silently truncated if it
> is longer than n. A multi-byte letter consists of several bytes and
> they should not be divided into pieces. Unconditional truncating
> multi-byte letters would make partial multi-byte bytes.
> 
> Attached patches should fix the problem.
> 
> Index: backend/utils/adt/varchar.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v
> retrieving revision 1.39
> diff -c -r1.39 varchar.c
> *** varchar.c	1998/09/01 04:32:53	1.39
> --- varchar.c	1998/09/24 09:03:37
> ***************
> *** 147,153 ****
> --- 147,160 ----
>   	if ((len == -1) || (len == VARSIZE(s)))
>   		return s;
>   
> + #ifdef MULTIBYTE
> + 	/* truncate multi-byte string in a way not to break
> + 	   multi-byte boundary */
> + 	rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
> + 	len = rlen + VARHDRSZ;
> + #else
>   	rlen = len - VARHDRSZ;
> + #endif
>   
>   	if (rlen > 4096)
>   		elog(ERROR, "bpchar: length of char() must be less than 4096");
> ***************
> *** 367,373 ****
> --- 374,387 ----
>   
>   	/* only reach here if we need to truncate string... */
>   
> + #ifdef MULTIBYTE
> + 	/* truncate multi-byte string in a way not to break
> + 	   multi-byte boundary */
> + 	len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ);
> + 	slen = len + VARHDRSZ;
> + #else
>   	len = slen - VARHDRSZ;
> + #endif
>   
>   	if (len > 4096)
>   		elog(ERROR, "varchar: length of varchar() must be less than 4096");
> Index: backend/utils/mb/mbutils.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
> retrieving revision 1.3
> diff -c -r1.3 mbutils.c
> *** mbutils.c	1998/09/01 04:33:22	1.3
> --- mbutils.c	1998/09/24 09:03:38
> ***************
> *** 202,207 ****
> --- 202,235 ----
>   }
>   
>   /*
> +  * returns the length of a multi-byte string
> +  * (not necessarily  NULL terminated)
> +  * that is not longer than limit.
> +  * this function does not break multi-byte word boundary.
> +  */
> + int
> + pg_mbcliplen(const unsigned char *mbstr, int len, int limit)
> + {
> + 	int			clen = 0;
> + 	int			l;
> + 
> + 	while (*mbstr &&  len > 0)
> + 	{
> + 		l = pg_mblen(mbstr);
> + 		if ((clen + l) > limit) {
> + 			break;
> + 		}
> + 		clen += l;
> + 		if (clen == limit) {
> + 			break;
> + 		}
> + 		len -= l;
> + 		mbstr += l;
> + 	}
> + 	return (clen);
> + }
> + 
> + /*
>    * fuctions for utils/init
>    */
>   static int	DatabaseEncoding = MULTIBYTE;
> Index: include/mb/pg_wchar.h
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v
> retrieving revision 1.4
> diff -c -r1.4 pg_wchar.h
> *** pg_wchar.h	1998/09/01 04:36:34	1.4
> --- pg_wchar.h	1998/09/24 09:03:42
> ***************
> *** 103,108 ****
> --- 103,109 ----
>   extern int	pg_mic_mblen(const unsigned char *);
>   extern int	pg_mbstrlen(const unsigned char *);
>   extern int	pg_mbstrlen_with_len(const unsigned char *, int);
> + extern int	pg_mbcliplen(const unsigned char *, int, int);
>   extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int);
>   extern bool show_client_encoding(void);
>   extern bool reset_client_encoding(void);
> 
> 


-- 
  Bruce Momjian                        |  maillist(at)candle(dot)pha(dot)pa(dot)us
  830 Blythe Avenue                    |  http://www.op.net/~candle
  Drexel Hill, Pennsylvania 19026      |  (610) 353-9879(w)
  +  If your life is a hard drive,     |  (610) 853-3000(h)
  +  Christ can be your backup.        |  

In response to

pgsql-hackers by date

Next:From: Thomas G. LockhartDate: 1998-09-25 02:30:51
Subject: Re: [HACKERS] pg_dump, problem with user defined types?
Previous:From: Tom LaneDate: 1998-09-24 18:26:07
Subject: Can slock_t ever be unaligned?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group