Re: Fix XML handling with DOCTYPE

From: Ryan Lambert <ryan(at)rustprooflabs(dot)com>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Fix XML handling with DOCTYPE
Date: 2019-03-27 12:22:43
Message-ID: CAN-V+g8NZHpvb+_kOaZTAq0azBBoQg1WaNU7+wDjvMTd2vejjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for putting up with a new reviewer!

--with-libxml is the PostgreSQL configure option to make it use libxml2.
>

> The very web page http://xmlsoft.org/index.html says "The XML C parser
> and toolkit of Gnome: libxml" and is all about libxml2.
>

> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?

I think leaving it as libxml makes sense with all that. Good point that
--with-libxml is used to build so I think staying with that works and is
consistent. I agree that having this point included does clarify the how
and why of the limitations of this implementation.

I also over-parenthesize so I'm used to looking for that in my own
writing. The full sentences were the ones that seemed excessive to me, I
think the others are ok and I won't nit-pick either way on those (unless
you want me to!).

If you agree, I should go through and fix my nodesets to be node-sets.

Yes, I like node-sets better, especially knowing it conforms to the spec's
language.

Thanks,

*Ryan Lambert*

On Tue, Mar 26, 2019 at 11:05 PM Chapman Flack <chap(at)anastigmatix(dot)net>
wrote:

> On 03/26/19 21:39, Ryan Lambert wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world: tested, passed
> > Implements feature: tested, passed
> > Spec compliant: not tested
> > Documentation: tested, passed
>
> Thanks for the review!
>
> > I have two recommendations for features.sgml. You state:
> >
> >> relies on the libxml library
> >
> > Should this be clarified as the libxml2 library? That's what I installed
> > to build postgres from source (Ubuntu 16/18). If it is the libxml
> library
> > and the "2" is irrelevant
>
> That's a good catch. I'm not actually sure whether there is any "libxml"
> library that isn't libxml2. Maybe there was once and nobody admits to
> hanging out with it. Most Google hits on "libxml" seem to be modules
> that have libxml in their names and libxml2 as their actual dependency.
>
> Perl XML:LibXML: "This module is an interface to libxml2"
> Haskell libxml: "Binding to libxml2"
> libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
> for the GNOME Libxml2 ..."
>
> --with-libxml is the PostgreSQL configure option to make it use libxml2.
>
> The very web page http://xmlsoft.org/index.html says "The XML C parser
> and toolkit of Gnome: libxml" and is all about libxml2.
>
> So I think I was unsure what convention to follow, and threw up my hands
> and went with libxml. I could just as easily throw them up and go with
> libxml2. Which do you think would be better?
>
> On 03/26/19 23:52, Tom Lane wrote:
> > Do we need to mention that at all? If you're not building from source,
> > it doesn't seem very interesting ... but maybe I'm missing some reason
> > why end users would care.
>
> The three places I've mentioned it were the ones where I thought users
> might care:
>
> - why are we stuck at XPath 1.0? It's what we get from the library we use.
>
> - in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
> it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
> a sequence type and you have order control. Observable behavior from
> libxml2 (and you could certainly want to know this) is you get things
> out
> in document order, whether that's what you wanted or not, even though
> this is undocumented, and even counter-documented[1], libxml2 behavior.
> So it's an example of something you would fundamentally like to know,
> where the only available answer depends precariously on the library
> we happen to be using.
>
> - which limits in our implementation are inherent to the library, and
> which are just current limits in our embedding of it? (Maybe this is
> right at the border of what a user would care to know, but I know it's
> a question that crosses my mind when I bonk into a limit I wasn't
> expecting.)
>
> > There are a few places where the parenthesis around a block of text
> > seem unnecessary.
>
> )blush( that's a long-standing wart in my writing ... seems I often think
> in parentheses, then look and say "those aren't needed" and take them out,
> only sometimes I don't.
>
> I skimmed just now and found a few instances of parenthesized whole
> sentence: the one you quoted, and some (if argument is null, the result
> is null), and (No rows will be produced if ....). Shall I deparenthesize
> them all? Did you have other instances in mind?
>
> > It seems you are standardizing from "node set" to "nodeset", is that
> > the preferred nomenclature or preference?
>
> Another good catch. I remember consciously making a last pass to get them
> all consistent, and I wanted them consistent with the spec, and I see now
> I messed up.
>
> XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
> six dozen of "node-set". The only appearances of "node set" without the
> hyphen are in a heading and its ToC entry. The stuff under that heading
> consistently uses node-set. It seems that's the XPath 1.0 term for sure.
>
> When I made my consistency pass, I must have been looking too recently
> in libxml2 C source, rather than the spec.
>
> On 03/26/19 23:52, Tom Lane wrote:
> > That seemed a bit jargon-y to me too. If that's standard terminology
> > in the XML world, maybe it's fine; but I'd have stuck with "node set".
>
> It really was my intention (though I flubbed it) to use XPath 1.0's term
> for XPath 1.0's concept; in my doc philosophy, that gives readers
> the most breadcrumbs to follow for the rest of the details if they want
> them. "Node set" might be some sort of squishy expository concept I'm
> using, but node-set is a thing, in a spec.
>
> If you agree, I should go through and fix my nodesets to be node-sets.
>
> I do think the terminology matters here, especially because of the
> differences between what you can do with a node-set (XPath 1.0 thing)
> and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).
>
> Let me know what you'd like best on these points and I'll revise the patch.
>
> Regards,
> -Chap
>
>
> [1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
> in no particular order"
>
> [2] https://www.w3.org/TR/1999/REC-xpath-19991116/
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-03-27 12:27:08 Re: Transaction commits VS Transaction commits (with parallel) VS query mean time
Previous Message Thomas Munro 2019-03-27 11:44:17 Re: Usage of epoch in txid_current