Re: Combining chars in psql (pre-patch)

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Patrice Hédé <phede-ml(at)islande(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Combining chars in psql (pre-patch)
Date: 2002-03-06 21:16:52
Message-ID: 200203062116.g26LGq011451@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"cvs diff -c" shows the differences from your source and cvs.

I am a little confused. What functionality does this add?

---------------------------------------------------------------------------

Patrice Hd wrote:
> Hi,
>
> I have been working a bit at a patch for that problem in psql. The
> patch is far from being ready for inclusion or whatever, it's just for
> comments...
>
> By the way, someone can tell me how to generate nice patches showing
> the difference between one's version and the cvs code that has been
> downloaded ? I'm new to this (I've only used cvs for personal projects
> so far, and I don't need to send patches to myself ;) ).
>
> The good things in this patch :
>
> - it works for me :)
>
> - I've used Markus Kuhn's implementation of wcwidth.c : it is locale
> independant, and is in the public domain. :) [if we keep it, I'll
> have to tell him, though !]
>
> - No dependency on the local libc's UTF-8-awareness ;) [I've seen that
> psql has no such dependancy, at least in print.c, so I haven't added
> any]. Actually, the change is completely self-contained.
>
> - I've made my own utf-8 -> ucs converter, since I haven't found any
> without a copyright notice yesterday. It checks invalid and
> non-optimal UTF-8 sequences, as requested per Unicode 3.0.1 (or 3.1,
> I don't remember).
>
> - it works for japanese (and I believe other "full-width" characters).
>
> - if MULTIBYTE is not defined, the code doesn't change from the
> commited version.
>
> The not so good things :
>
> - I've made my own utf-8 -> ucs converter... It seems to work fine,
> but it's not tested well enough, it may not be so robust.
>
> - The printf( "%*s", width, utfstr) doesn't work as expected, so I had
> to fix by doing printf( "%*s%s", width - utfstrwidth, "", utfstr);
>
> - everything in #ifdef MULTIBYTE/#endif . Since they're is no
> dependancy on anything else (including the rest of the multibyte
> implementation - which I haven't had the time to look at in detail),
> it doesn't depend on it.
>
> - I get this (for each call to pg_mb_utfs_width) and I don't know why :
>
> print.c:265: warning: passing arg 1 of `pg_mb_utfs_width' discards
> qualifiers from pointer target type
>
> - If pg_mb_utfs_width finds an invalid UTF-8 string, it truncates it.
> I suppose that's what we want to do, but that's probably not the
> best place to do it.
>
> The bad things :
>
> - If MULTIBYTE is defined, the strings must be in UTF-8, it doesn't
> check any encoding.
>
> - it is not integrated at all with the rest of the MB code.
>
> - it doesn't respect the indentation policy ;)
>
>
> To do :
>
> - integrate better with the rest of the MB (client-side encoding), and
> with the rest of the code of print.c .
>
> - verify utf8-to-ucs robustness seriously.
>
> - make a visually nicer code :)
>
> - find better function names.
>
> And possibly :
>
> - consolidate the code, in order to remove the need for the #ifdef's
> in many places.
>
> - make it working with some others multiwidth-encoding (but then, I
> don't know anything about these encodings myself !).
>
> - check also utf-8 stream at input time, so that no invalid utf-8 is
> sent to the backend (at least from psql - the backend will need also
> a strict checking for UTF-8).
>
> - add nice UTF-8 borders as an option :)
>
> - add a command-line parameter to consider Unicode Ambiguous
> characters (characters which can be narrow or wide, depending on the
> terminal) wide characters, as it seems to be the case for CJK
> terminals (as per TR#11).
>
> - What else ?
>
>
> BTW, here is the table I had in the first mail. I would have shown the
> one with all the weird Unicode characters, but my mutt is configured
> with iso-8859-15, and I doubt many of you have utf-8 as a default yet
> :)
>
> +------+-------+--------+
> | lang | text | text |
> +------+-------+--------+
> | isl | ?l?ta | ?leit |
> | isl | ?l?ta | ?litum |
> | isl | ?l?ta | ?liti? |
> | isl | ma?ur | mann |
> | isl | ma?ur | m?nnum |
> | isl | ma?ur | manna |
> | isl | ?ska | -a?i |
> +------+-------+--------+
>
>
> The files in attachment :
> - a diff for pgsql/src/bin/psql/print.c
> - a diff for pgsql/src/bin/psql/Makefile
> - two new files :
> pgsql/src/bin/psql/pg_mb_utf8.c
> pgsql/src/bin/psql/pg_mb_utf8.h
>
> Have fun !
>
> Patrice
>
> --
> Patrice H?D? ------------------------------- patrice ? islande org -----
> -- Isn't it weird how scientists can imagine all the matter of the
> universe exploding out of a dot smaller than the head of a pin, but they
> can't come up with a more evocative name for it than "The Big Bang" ?
> -- What would _you_ call the creation of the universe ?
> -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes
> ------------------------------------------ http://www.islande.org/ -----

[ Attachment, skipping... ]

[ Attachment, skipping... ]

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Trond Eivind =?iso-8859-1?q?Glomsr=F8d?= 2002-03-06 21:43:18 Re: Mandrake RPMs rebuilt
Previous Message Bruce Momjian 2002-03-06 20:49:37 Re: new hash function