Skip site navigation (1) Skip section navigation (2)

BUG #4622: xpath only work in utf-8 server encoding

From: "Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4622: xpath only work in utf-8 server encoding
Date: 2009-01-22 13:39:00
Message-ID: 200901221339.n0MDd0dE033542@wwwmaster.postgresql.org (view raw or flat)
Thread:
Lists: pgsql-bugs
The following bug has been logged online:

Bug reference:      4622
Logged by:          Sergey Burladyan
Email address:      eshkinkot(at)gmail(dot)com
PostgreSQL version: 8.3.5
Operating system:   Debian testing
Description:        xpath only work in utf-8 server encoding
Details: 

hello, all !

i am trying for test parse xml string in other than utf-8 encoding, it
correctly loaded but xpath(text, xml) can't handle it:

seb(at)seb:~/tmp/pg$ echo $LANG
ru_RU.CP1251
seb(at)seb:~/tmp/pg$ /usr/lib/postgresql/8.3/bin/postgres -p 5433 -k s -s -D .
LOG:  система была отключена: 2009-01-22 16:30:07 MSK
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

seb(at)seb:~$ echo $LANG
ru_RU.CP1251
seb(at)seb:~$ psql -h localhost -p 5433
Welcome to psql 8.3.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

seb=# select * from (select
xml('<русский>язык</русский>')) as x(v);
            v
-------------------------
 <русский>язык</русский>
(1 запись)

seb=# select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
ERROR:  could not parse XML data
DETAIL:  Entity: line 1: parser error : Input is not proper UTF-8, indicate
encoding !
Bytes: 0xF0 0xF3 0xF1 0xF1
<x><русский>язык</русский></x>
    ^
seb=# select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
      name       |   setting
-----------------+--------------
 client_encoding | WIN1251
 lc_collate      | ru_RU.CP1251
 lc_ctype        | ru_RU.CP1251
 lc_messages     | ru_RU.CP1251
 lc_monetary     | ru_RU.CP1251
 lc_numeric      | ru_RU.CP1251
 lc_time         | ru_RU.CP1251
 server_encoding | WIN1251
(8 rows)

in utf-8 server encoding it work correctly:

seb=> select xpath('/русский/text()', v::xml) from (select
xml('<русский>язык</русский>')) as x(v);
 xpath
--------
 {язык}
(1 запись)

seb=> select name, setting from pg_settings where name like 'lc_%' or name
like '%enco%';
      name       |   setting
-----------------+-------------
 client_encoding | UTF8
 lc_collate      | ru_RU.UTF-8
 lc_ctype        | ru_RU.UTF-8
 lc_messages     | ru_RU.UTF-8
 lc_monetary     | ru_RU.UTF-8
 lc_numeric      | ru_RU.UTF-8
 lc_time         | ru_RU.UTF-8
 server_encoding | UTF8
(8 rows)

i am think something is wrong here, string parsed correctly by xml(text),
but it result can't pass to xpath function...

Responses

pgsql-bugs by date

Next:From: Peter EisentrautDate: 2009-01-22 21:58:49
Subject: Re: BUG #4622: xpath only work in utf-8 server encoding
Previous:From: Michael MeskesDate: 2009-01-22 11:08:54
Subject: Re: segmentation fault on Dynamic query using C

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group