Re: client side syntax error localisation for psql (v1)

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: coelho(at)cri(dot)ensmp(dot)fr
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: client side syntax error localisation for psql (v1)
Date: 2004-03-12 13:45:42
Message-ID: 20040312.224542.78705812.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > PQmblen returns the storage size, which is not necessarily same as the
> > character width reprensented in a terminal. For example for a kanji
> > character in UTF-8 PQmblen returns 3, but it ocuppies 2 x ASCII
> > character space, not x 3. Isn't that a problem for you?
>
> If I read you correctly, you mean that 1 character may take 3 bytes
> of storage in the string, but it is not guaranteed to be 1 character
> from the terminal perspective... Argh, that's definitely an issue:-(
> I assumed that one character whatever the encoding would be 1 character
> on the display.

That's not correct...

One thing I have to note is that some Asian characters such as
Japanese, Chinese require twice the space on a terminal for each
character comparing with plain ASCII characters. This is hard to
explain to those who are not familiar with kanji... Could you take a
look at included screen shot? As you can see there are four ASCII
characters in the first line. On the second line there are *two* kanji
characters and they occupy same space as above four ASCII
characters. Moreover the strage size for the first line is 4, but the
strage size for the second line may vary depending on the encoding. If
the encoding is EUC_JP or SJIS, it takes 4 bytes, however it takes 6
bytes if the encoding is UTF-8. Got it?

> If it is not the case, I think I can put/compute this information in the
> translation structures that is use by PQmblen, and implement a
> PQmbtermlen function...
>
> Maybe you could point me some source of information about display lengths
> of characters depending on the encoding?

I could write "PQmbtermlen" function for every encoding supported by
PostgreSQL except UTF-8. Such kind of info for UTF-8 might be quite
complex. I believe there are some mapping tables or functions to get
such kind of info somewhere on the Internet, but I don't remember.

> > I think you can do it safely using PQmblen.
>
> Ok, what you describe is basically what I've done with the qidx
> computation as suggested by Tom Lane and then later I check that the
> encoded length is one to find my special characters.

Oh, ok.

> Thanks for you reply,

You are welcome!
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Pflug 2004-03-12 13:56:40 Re: The Name Game: postgresql.net vs. pgfoundry.org
Previous Message Chris Ryan 2004-03-12 13:15:30 Re: [HACKERS] The Name Game: postgresql.net vs. pgfoundry.org