Re: XML only working in UTF-8 - Re: 8.4 open items list

From: Sergey Burladyan <eshkinkot(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: XML only working in UTF-8 - Re: 8.4 open items list
Date: 2009-04-07 09:14:26
Message-ID: 87y6ucakr1.fsf@seb.progtech.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> As near as I can tell, every place where you see an explicit cast
> between char * and xmlChar * is probably broken. I think we ought
> to approach this by refactoring to have all those conversions go
> through subroutines, instead of blithely casting.

There is another issue (from sql.ru forum):
seb=> select xmlelement(name язык, xmlattributes('русский' as "значение"));
xmlelement
----------------------------------------------------------------------
<язык значение="&#x440;&#x443;&#x441;&#x441;&#x43A;&#x438;&#x439;"/>

xmlattributes always encode non-latin text as html entities
server_encoding UTF8
client_encoding UTF8

This is strange behavior of libxml... i can't find documentation about this.
http://www.xmlsoft.org/examples/testWriter.c use xmlTextWriterStartDocument
and set output encoding with it. Without it, all non-latin nodes and it values
written correctly (it is UTF-8), except attribute value, this is strange, imho.

xmltype * xmlelement(XmlExprState *xmlExpr, ExprContext *econtext) from xml.c
not use xmlTextWriterStartDocument and return html entities in attribute values.

--
Sergey Burladyan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-04-07 09:38:09 Re: More message encoding woes
Previous Message Heikki Linnakangas 2009-04-07 08:21:25 Re: More message encoding woes