Quick Links

psql display of Unicode combining characters in 8.2

From:	Michael Fuhr <mike(at)fuhr(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	psql display of Unicode combining characters in 8.2
Date:	2006-12-10 05:50:05
Message-ID:	20061210055005.GA25816@winnie.fuhr.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

psql's display of Unicode combining characters appears to have
changed in 8.2. For example, I'd expect <U+006E LATIN SMALL LETTER N,
U+0303 COMBINING TILDE> to display the same as the precomposed
<U+00F1 LATIN SMALL LETTER N WITH TILDE>. With 8.1's psql they do,
but with 8.2's psql this sequence displays as:

SELECT E'n\314\203'; -- \314\203 = UTF-8 encoding of U+0303
?column?
----------
n\u0303
(1 row)

(I'm testing with both server and client using UTF-8.)

This excerpt from pg_wcsformat() in mbprint.c looks responsible:

else if (w <= 0) /* Non-ascii control char */
{
if (encoding == PG_UTF8)
sprintf((char *) ptr, "\\u%04X", utf2ucs(pwcs));

This might be the relevant commit:

http://archives.postgresql.org/pgsql-committers/2006-02/msg00089.php

Should the code distinguish between combining characters and
zero-width control characters so the former display correctly?

--
Michael Fuhr

Responses

Re: psql display of Unicode combining characters in 8.2 at 2006-12-10 14:53:36 from Michael Fuhr
Re: psql display of Unicode combining characters in 8.2 at 2006-12-10 16:28:20 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2006-12-10 12:57:01	Re: [HACKERS] Configuring BLCKSZ and XLOGSEGSZ (in 8.3)
Previous Message	Martijn van Oosterhout	2006-12-08 22:10:14	Re: #define GEVHDRSZ ( offsetof(GistEntryVector, vector[0]) ) explanation please