Re: [PATCH] Add CANONICAL option to xmlserialize

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize
Date: 2026-05-26 09:46:44
Message-ID: 3b0b381b-a89f-4bb1-a1d3-25b2ba7b8907@uni-muenster.de
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30/03/2026 13:27, Jim Jones wrote:
> On 30/03/2026 11:44, Andrew Dunstan wrote:
>> I note that your function returns xml, whereas Tom's suggestion was for
>> a function returning text. I don't think there was any discussion on the
>> point.
> Indeed, there was no discussion regarding the return type.
>
> My rationale for keeping it as xml was: the output is xml, callers can
> immediately use the xml without casting, and nearly all other xml*
> functions return xml. Is there a direct advantage of having this
> function return text?

After some consideration, I think returning text instead of xml is
indeed the better choice here. The canonical form is a serialization
artifact rather than a document intended for further XML processing.
More practically, since xml has no = operator, the primary use case of
comparing documents requires casting anyway -- returning text is indeed
closer to real-world usage.

I also noticed a correctness issue with database encoding: the C14N 1.1
specification mandates UTF8 output, so xmlC14NDocDumpMemory always
returns UTF8.[1] I added a pg_any_to_server call to convert the output
to the server encoding before returning.

Best, Jim

1 -
https://github.com/GNOME/libxml2/blob/174201f747da93167354287a7599d0b385552599/c14n.c#L1964

Attachment Content-Type Size
v25-0001-Add-xmlcanonicalize-function.patch text/x-patch 32.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andriy Dorokhin 2026-05-26 09:54:21 RFC: Boyer-Moore-Horspool optimization for LIKE '%pattern%' searches
Previous Message Imran Zaheer 2026-05-26 09:39:29 Re: effective_wal_level is not decreasing after using REPACK (CONCURRENTLY)