Re: [PATCH] Add CANONICAL option to xmlserialize

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize
Date: 2023-10-04 09:39:44
Message-ID: CALDaNm3LsLy-yrzmo-x_uz7Ev3JjMMv7zYeu0X315RNecJDEWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 17 Mar 2023 at 18:01, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> After some more testing I realized that v5 was leaking the xmlDocPtr.
>
> Now fixed in v6.

Few comments:
1) Why the default option was chosen without comments shouldn't it be
the other way round?
+opt_xml_serialize_format:
+ INDENT
{ $$ = XMLSERIALIZE_INDENT; }
+ | NO INDENT
{ $$ = XMLSERIALIZE_NO_FORMAT; }
+ | CANONICAL
{ $$ = XMLSERIALIZE_CANONICAL; }
+ | CANONICAL WITH NO COMMENTS
{ $$ = XMLSERIALIZE_CANONICAL; }
+ | CANONICAL WITH COMMENTS
{ $$ = XMLSERIALIZE_CANONICAL_WITH_COMMENTS; }
+ | /*EMPTY*/
{ $$ = XMLSERIALIZE_NO_FORMAT; }

2) This should be added to typedefs.list:
+typedef enum XmlSerializeFormat
+{
+ XMLSERIALIZE_INDENT, /*
pretty-printed xml serialization */
+ XMLSERIALIZE_CANONICAL, /*
canonical form without xml comments */
+ XMLSERIALIZE_CANONICAL_WITH_COMMENTS, /* canonical form with
xml comments */
+ XMLSERIALIZE_NO_FORMAT /*
unformatted xml representation */
+} XmlSerializeFormat;

3) This change is not required:
return result;
+
#else
NO_XML_SUPPORT();
return NULL;

4) This comment body needs slight reformatting:
+ /*
+ * Parse the input according to the xmloption.
+ * XML canonical expects a well-formed XML input, so here in case of
+ * XMLSERIALIZE_CANONICAL or XMLSERIALIZE_CANONICAL_WITH_COMMENTS we
+ * force xml_parse() to parse 'data' as XMLOPTION_DOCUMENT despite
+ * of the XmlOptionType given in 'xmloption_arg'. This enables the
+ * canonicalization of CONTENT fragments if they contain a singly-rooted
+ * XML - xml_parse() will thrown an error otherwise.
+ */

5) Similarly here too:
- if (newline == NULL || xmlerrcxt->err_occurred)
+ * Emit declaration only if the input had one.
Note: some versions of
+ * xmlSaveToBuffer leak memory if a non-null
encoding argument is
+ * passed, so don't do that. We don't want any
encoding conversion
+ * anyway.
+ */
+ if (decl_len == 0)

6) Similarly here too:
+ /*
+ * Deal with the case where we have
non-singly-rooted XML.
+ * libxml's dump functions don't work
well for that without help.
+ * We build a fake root node that
serves as a container for the
+ * content nodes, and then iterate over
the nodes.
+ */

7) Similarly here too:
+ /*
+ * We use this node to insert newlines
in the dump. Note: in at
+ * least some libxml versions,
xmlNewDocText would not attach the
+ * node to the document even if we
passed it. Therefore, manage
+ * freeing of this node manually, and
pass NULL here to make sure
+ * there's not a dangling link.
+ */

8) Should this:
+ * of the XmlOptionType given in 'xmloption_arg'. This enables the
+ * canonicalization of CONTENT fragments if they contain a singly-rooted
+ * XML - xml_parse() will thrown an error otherwise.
Be:
+ * of the XmlOptionType given in 'xmloption_arg'. This enables the
+ * canonicalization of CONTENT fragments if they contain a singly-rooted
+ * XML - xml_parse() will throw an error otherwise.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message pgchem pgchem 2023-10-04 09:56:51 Is the logfile the only place to find the finish LSN?
Previous Message Michael Paquier 2023-10-04 08:19:40 Re: Rethink the wait event names for postgres_fdw, dblink and etc