Re: [HACKERS] fix for multi-byte partial truncating

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: t-ishii(at)sra(dot)co(dot)jp
Cc: hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] fix for multi-byte partial truncating
Date: 1998-09-25 01:47:15
Message-ID: 199809250147.VAA22725@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Applied, but for some reason patch did not like the normal cvs/rcs diff
format. Not sure why. Please check to see it is OK. Looks OK here.

> For varchar(n)/char(n) type, input string is silently truncated if it
> is longer than n. A multi-byte letter consists of several bytes and
> they should not be divided into pieces. Unconditional truncating
> multi-byte letters would make partial multi-byte bytes.
>
> Attached patches should fix the problem.
>
> Index: backend/utils/adt/varchar.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/adt/varchar.c,v
> retrieving revision 1.39
> diff -c -r1.39 varchar.c
> *** varchar.c 1998/09/01 04:32:53 1.39
> --- varchar.c 1998/09/24 09:03:37
> ***************
> *** 147,153 ****
> --- 147,160 ----
> if ((len == -1) || (len == VARSIZE(s)))
> return s;
>
> + #ifdef MULTIBYTE
> + /* truncate multi-byte string in a way not to break
> + multi-byte boundary */
> + rlen = pg_mbcliplen(VARDATA(s), len - VARHDRSZ, len - VARHDRSZ);
> + len = rlen + VARHDRSZ;
> + #else
> rlen = len - VARHDRSZ;
> + #endif
>
> if (rlen > 4096)
> elog(ERROR, "bpchar: length of char() must be less than 4096");
> ***************
> *** 367,373 ****
> --- 374,387 ----
>
> /* only reach here if we need to truncate string... */
>
> + #ifdef MULTIBYTE
> + /* truncate multi-byte string in a way not to break
> + multi-byte boundary */
> + len = pg_mbcliplen(VARDATA(s), slen - VARHDRSZ, slen - VARHDRSZ);
> + slen = len + VARHDRSZ;
> + #else
> len = slen - VARHDRSZ;
> + #endif
>
> if (len > 4096)
> elog(ERROR, "varchar: length of varchar() must be less than 4096");
> Index: backend/utils/mb/mbutils.c
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
> retrieving revision 1.3
> diff -c -r1.3 mbutils.c
> *** mbutils.c 1998/09/01 04:33:22 1.3
> --- mbutils.c 1998/09/24 09:03:38
> ***************
> *** 202,207 ****
> --- 202,235 ----
> }
>
> /*
> + * returns the length of a multi-byte string
> + * (not necessarily NULL terminated)
> + * that is not longer than limit.
> + * this function does not break multi-byte word boundary.
> + */
> + int
> + pg_mbcliplen(const unsigned char *mbstr, int len, int limit)
> + {
> + int clen = 0;
> + int l;
> +
> + while (*mbstr && len > 0)
> + {
> + l = pg_mblen(mbstr);
> + if ((clen + l) > limit) {
> + break;
> + }
> + clen += l;
> + if (clen == limit) {
> + break;
> + }
> + len -= l;
> + mbstr += l;
> + }
> + return (clen);
> + }
> +
> + /*
> * fuctions for utils/init
> */
> static int DatabaseEncoding = MULTIBYTE;
> Index: include/mb/pg_wchar.h
> ===================================================================
> RCS file: /usr/local/cvsroot/pgsql/src/include/mb/pg_wchar.h,v
> retrieving revision 1.4
> diff -c -r1.4 pg_wchar.h
> *** pg_wchar.h 1998/09/01 04:36:34 1.4
> --- pg_wchar.h 1998/09/24 09:03:42
> ***************
> *** 103,108 ****
> --- 103,109 ----
> extern int pg_mic_mblen(const unsigned char *);
> extern int pg_mbstrlen(const unsigned char *);
> extern int pg_mbstrlen_with_len(const unsigned char *, int);
> + extern int pg_mbcliplen(const unsigned char *, int, int);
> extern pg_encoding_conv_tbl *pg_get_encent_by_encoding(int);
> extern bool show_client_encoding(void);
> extern bool reset_client_encoding(void);
>
>

--
Bruce Momjian | maillist(at)candle(dot)pha(dot)pa(dot)us
830 Blythe Avenue | http://www.op.net/~candle
Drexel Hill, Pennsylvania 19026 | (610) 353-9879(w)
+ If your life is a hard drive, | (610) 853-3000(h)
+ Christ can be your backup. |

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-09-25 02:30:51 Re: [HACKERS] pg_dump, problem with user defined types?
Previous Message Tom Lane 1998-09-24 18:26:07 Can slock_t ever be unaligned?