Re: Fix XML handling with DOCTYPE

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-27 05:05:27
Message-ID: 5C9B0497.9020802@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/26/19 21:39, Ryan Lambert wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: not tested
> Documentation: tested, passed

Thanks for the review!

> I have two recommendations for features.sgml. You state:
>
>> relies on the libxml library
>
> Should this be clarified as the libxml2 library? That's what I installed
> to build postgres from source (Ubuntu 16/18). If it is the libxml library
> and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

Perl XML:LibXML: "This module is an interface to libxml2"
Haskell libxml: "Binding to libxml2"
libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

--with-libxml is the PostgreSQL configure option to make it use libxml2.

The very web page http://xmlsoft.org/index.html says "The XML C parser
and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:
> Do we need to mention that at all? If you're not building from source,
> it doesn't seem very interesting ... but maybe I'm missing some reason
> why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

- why are we stuck at XPath 1.0? It's what we get from the library we use.

- in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
a sequence type and you have order control. Observable behavior from
libxml2 (and you could certainly want to know this) is you get things out
in document order, whether that's what you wanted or not, even though
this is undocumented, and even counter-documented[1], libxml2 behavior.
So it's an example of something you would fundamentally like to know,
where the only available answer depends precariously on the library
we happen to be using.

- which limits in our implementation are inherent to the library, and
which are just current limits in our embedding of it? (Maybe this is
right at the border of what a user would care to know, but I know it's
a question that crosses my mind when I bonk into a limit I wasn't
expecting.)

> There are a few places where the parenthesis around a block of text
> seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ....). Shall I deparenthesize
them all? Did you have other instances in mind?

> It seems you are standardizing from "node set" to "nodeset", is that
> the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:
> That seemed a bit jargon-y to me too. If that's standard terminology
> in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap

[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

[2] https://www.w3.org/TR/1999/REC-xpath-19991116/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-03-27 05:13:42 Re: speeding up planning with partitions
Previous Message Amit Langote 2019-03-27 04:50:59 Re: speeding up planning with partitions