Re: Encoding problems in PostgreSQL with XML data

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Encoding problems in PostgreSQL with XML data
Date: 2004-01-15 11:10:16
Message-ID: 1074165016.3206.27.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Merlin Moncure kirjutas K, 14.01.2004 kell 15:49:
> Hannu Krosing wrote:
> > I hope that real as-needed-column-by-column translation will be used
> > with bound argument queries.
> >
> > It also seems possible to delegate the encoding changes to after the
> > query is parsed, but this will never work for EBCDIC and other funny
> > encodings (like rot13 ;).
> >
> > for these we need to define the actual SQL statement encoding on-wire
> to
> > be always ASCII.
>
> In that case, treat the XML document like a binary stream, using
> PQescapeBytea, etc. to encode if necessary pre-query. Also, the XML
> domain should inherit from bytea, not varchar.

why ?

the allowed characters repertoire in XML is even less than in varchar.

> The document should be stored bit for bit as was submitted.

Or in some pre-parsed form which allows restoration of submitted form,
which could be more for things like xpath queries or subtree extraction.

> If we can do that for bitmaps, why can't we do it for XML documents?
>
> OTOH, if we are transforming the document down to a more generic format
> (either canonical or otherwise), then the xml could be dealt with like
> text in the ususal way. Of course, then we are not really storing xml,
> more like 'meta' xml ;)

On the contrary! If there is DTD or Schema or other structure definition
for XML, then we know which whitespace is significant and can do
whatever we like with insignificant whitespace.

It also is ok to store all XML in some UNICODE encoding as this is what
every XML must be convertible to.

its he same as storing ints - you don't care if you specified 1000 ot
1e3 when doing the insert as

hannu=# select 1000=1e3;
?column?
----------
t
(1 row)

in the same way the following should also be true

select
'<d/>'::xml == '<?xml version="1.0" encoding="utf-8"?>\n<d/>\n'::xml
;

-----------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jihuang 2004-01-15 11:11:30 FYI , Intel CC and PostgreSQL , benchmark by pgsql
Previous Message Michael Glaesemann 2004-01-15 09:23:21 Re: Bug and/or feature? Complex data types in tables...