Re: [PATCH] Add CANONICAL option to xmlserialize

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize
Date: 2023-10-04 16:19:18
Message-ID: da7a862e-2a6e-d390-0ac0-a58ddb1454e2@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh

Thanks for the thorough review!

On 04.10.23 11:39, vignesh C wrote:
> Few comments:
> 1) Why the default option was chosen without comments shouldn't it be
> the other way round?
> +opt_xml_serialize_format:
> + INDENT
> { $$ = XMLSERIALIZE_INDENT; }
> + | NO INDENT
> { $$ = XMLSERIALIZE_NO_FORMAT; }
> + | CANONICAL
> { $$ = XMLSERIALIZE_CANONICAL; }
> + | CANONICAL WITH NO COMMENTS
> { $$ = XMLSERIALIZE_CANONICAL; }
> + | CANONICAL WITH COMMENTS
> { $$ = XMLSERIALIZE_CANONICAL_WITH_COMMENTS; }
> + | /*EMPTY*/
> { $$ = XMLSERIALIZE_NO_FORMAT; }
I'm not sure it is the way to go. The main idea is to check if two
documents have the same content, and comments might be different even if
the contents of two documents are identical. What are your concerns
regarding this default behaviour?
> 2) This should be added to typedefs.list:
> +typedef enum XmlSerializeFormat
> +{
> + XMLSERIALIZE_INDENT, /*
> pretty-printed xml serialization */
> + XMLSERIALIZE_CANONICAL, /*
> canonical form without xml comments */
> + XMLSERIALIZE_CANONICAL_WITH_COMMENTS, /* canonical form with
> xml comments */
> + XMLSERIALIZE_NO_FORMAT /*
> unformatted xml representation */
> +} XmlSerializeFormat;
added.
> 3) This change is not required:
> return result;
> +
> #else
> NO_XML_SUPPORT();
> return NULL;
removed.
>
> 4) This comment body needs slight reformatting:
> + /*
> + * Parse the input according to the xmloption.
> + * XML canonical expects a well-formed XML input, so here in case of
> + * XMLSERIALIZE_CANONICAL or XMLSERIALIZE_CANONICAL_WITH_COMMENTS we
> + * force xml_parse() to parse 'data' as XMLOPTION_DOCUMENT despite
> + * of the XmlOptionType given in 'xmloption_arg'. This enables the
> + * canonicalization of CONTENT fragments if they contain a singly-rooted
> + * XML - xml_parse() will thrown an error otherwise.
> + */
reformatted.
> 5) Similarly here too:
> - if (newline == NULL || xmlerrcxt->err_occurred)
> + * Emit declaration only if the input had one.
> Note: some versions of
> + * xmlSaveToBuffer leak memory if a non-null
> encoding argument is
> + * passed, so don't do that. We don't want any
> encoding conversion
> + * anyway.
> + */
> + if (decl_len == 0)
reformatted.
> 6) Similarly here too:
> + /*
> + * Deal with the case where we have
> non-singly-rooted XML.
> + * libxml's dump functions don't work
> well for that without help.
> + * We build a fake root node that
> serves as a container for the
> + * content nodes, and then iterate over
> the nodes.
> + */
reformatted.
> 7) Similarly here too:
> + /*
> + * We use this node to insert newlines
> in the dump. Note: in at
> + * least some libxml versions,
> xmlNewDocText would not attach the
> + * node to the document even if we
> passed it. Therefore, manage
> + * freeing of this node manually, and
> pass NULL here to make sure
> + * there's not a dangling link.
> + */
reformatted.
> 8) Should this:
> + * of the XmlOptionType given in 'xmloption_arg'. This enables the
> + * canonicalization of CONTENT fragments if they contain a singly-rooted
> + * XML - xml_parse() will thrown an error otherwise.
> Be:
> + * of the XmlOptionType given in 'xmloption_arg'. This enables the
> + * canonicalization of CONTENT fragments if they contain a singly-rooted
> + * XML - xml_parse() will throw an error otherwise.

I didn't understand the suggestion in 8) :)

Thanks again for the review. Much appreciated!

v7 attached.

Best, Jim

Attachment Content-Type Size
v7-0001-Add-CANONICAL-output-format-to-xmlserialize.patch text/x-patch 42.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-04 16:24:36 Re: --sync-method isn't documented to take an argument
Previous Message Nathan Bossart 2023-10-04 16:12:14 Re: --sync-method isn't documented to take an argument