Skip site navigation (1) Skip section navigation (2)

Re: This approach to non-ASCII names does not work

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-docs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, yazicivo(at)ttnet(dot)net(dot)tr
Subject: Re: This approach to non-ASCII names does not work
Date: 2006-09-22 17:17:29
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-docs
That makes a lot of sense.  The encoding mentioned in the HTML is how
high-bit characters are treated in the HTML, and doesn't control what
entities it supports.

However, I am confused how non-Latin users can use SGML if it does not
support UTF8 entities.  I see this flag in openjade:

	  -b, --encoding=NAME         Use encoding NAME for output.

but I assume it is only for how to treat the high bits in the file, not
for entity recognition.

I IM'ed with Peter and he said SGML Docbook just doesn't support UTF8
easily, so I am reverting Volkan YAZICI's name to be ASCII (he requested
an all-uppercase last name if we can't use the proper symbol), and
documented we can only use HTML4 entities, and updated the URLs we
should use for reference.  I have the official URL and URLs that show
the actual symbols too, which is helpful.

If people have names that contain HTML4 symbols, please let me know so I
can add the symbols:


Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > The unusual thing is that though our docs web pages use a stated
> > encoding as ISO-8859-1, the UTF8 number does generate the proper
> > symbol in my browser (Mozilla), so I wonder if >255 codes are assumed
> > to be UTF8.
> These are two different things.
> A numeric character reference picks the numbered character from the 
> document character set.  The document character set is declared in the 
> document type declaration (and is therefore fixed by the standards 
> committee for all users).  The document character sets for commonly 
> used SGML applications are:
> HTML 3.2	Latin 1 (ISO 646 + ECMA 94)
> HTML 4+		UCS (ISO 10646)
> XML		UCS (ISO 10646)
> DocBook SGML	Latin 1 (ISO 646 + ECMA 94)
> If a font is available, an HTML application (browser) should be able to 
> process (display) any character from the document character set, 
> whether it arrives in plain or as a character entity.
> Conversely, a character not in the document character set, such as a 
> non-Latin-1 character in DocBook SGML, cannot be processed, strictly 
> speaking.
> The other thing you are talking about is the character *encoding* which 
> specifies how the sequence of bytes that makes up the document is to be 
> interpreted.  Note that this happens before the document character set 
> is taken into consideration and is pretty much independent of it.  For 
> example, knowledge of the character encoding is necessary to find 
> the "&" that starts entities.  Not all character encodings are capable 
> of encoding all characters in the document character set, which is why 
> you need to use character entities to access characters outside the 
> encoding.
> -- 
> Peter Eisentraut
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

  Bruce Momjian   bruce(at)momjian(dot)us

  + If your life is a hard drive, Christ can be your backup. +

In response to

pgsql-docs by date

Next:From: Gurjeet SinghDate: 2006-09-25 11:14:47
Subject: Broken link in PG docs
Previous:From: Bruce MomjianDate: 2006-09-20 22:48:57
Subject: Re: This approach to non-ASCII names does not work

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group