Re: [PATCH] Add CANONICAL option to xmlserialize

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize
Date: 2023-03-06 10:50:54
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06.03.23 00:32, Thomas Munro wrote:
> I couldn't reproduce that locally either, but I just tested on CI with
> your patch applied saw the failure, and then removed
> "PYTHONCOERCECLOCALE=0 LANG=C" and it's all green:
> Without looking too closely, my first guess would have been that this
> just isn't going to work without UTF-8 database encoding, so you might
> need to skip the test (see for example
> src/test/regress/expected/unicode_1.out). It's annoying that "xml"
> already has 3 expected variants... hmm. BTW shouldn't it be failing
> in a more explicit way somewhere sooner if the database encoding is
> not UTF-8, rather than getting confused?

I guess this confusion is happening because xml_parse() was being called
with the database encoding from GetDatabaseEncoding().

I added a condition before calling xml_parse() to check if the xml
document has a different encoding than UTF-8

parse_xml_decl(xml_text2xmlChar(data), NULL, NULL, &encodingStr, NULL);
encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8;

doc = xml_parse(data, XMLOPTION_DOCUMENT, false, encoding, NULL);

v2 attached.


Best, Jim

Attachment Content-Type Size
v2-0001-Add-CANONICAL-format-to-xmlserialize.patch text/x-patch 32.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dag Lem 2023-03-06 11:07:03 Re: daitch_mokotoff module
Previous Message Dilip Kumar 2023-03-06 10:48:10 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher