From: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Vik Fearing <vik(at)postgresfriends(dot)org> |
Subject: | Re: [PoC] XMLCast (SQL/XML X025) |
Date: | 2025-05-19 23:11:07 |
Message-ID: | a2972522-c359-421d-a513-b63f8625dce8@uni-muenster.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Robert
On 19.05.25 16:26, Robert Haas wrote:
> Well, this patch is now more than 10 months old, and it's still the
> case that nobody other than the author has said that they want this.
> Is it time to give up?
Not quite yet -- unless there is an expiration date that I am not aware
of :). If we decide we don't need XMLCast on Postgres after all, I'd
suggest to delete it from the todo list on the wiki [1] - I've already
added a link to this thread there.
> I still don't think it's very clear either from the patch or from the
> thread how this differs from existing facilities. As far as I can see,
> there are zero comments in the patch explaining the design decisions
> that it makes, and nothing in the commit message, the comments, or
> even the thread itself addressing my concern from my previous review:
> how is this consistent, or inconsistent, with what we do in other
> similar cases, and why?
I guess I misunderstood your message last year. I was under the
impression that you were only concerned about the handling of intervals,
which I replied here [2]. Basically I said that XMLTable is doing pretty
much the same ...
WITH j (val) AS (
SELECT
'<foo>
<interval>P1Y2M3DT4H5M6S</interval>
<timestamp>2002-05-30T09:30:10</timestamp>
<integer>42</integer>
<numeric>-42.73</numeric>
<text>foo & <"bar"></text>
<boolean>false</boolean>
</foo>'::xml
)
SELECT a, b, c, d, e, f
FROM j,
XMLTABLE(
'/foo'
PASSING val
COLUMNS
a interval PATH 'interval',
b timestamp PATH 'timestamp',
c integer PATH 'integer',
d numeric PATH 'numeric',
e text PATH 'text',
f boolean PATH 'boolean');
a | b | c | d
| e | f
-------------------------------+---------------------+----+--------+---------------+---
1 year 2 mons 3 days 04:05:06 | 2002-05-30 09:30:10 | 42 | -42.73 | foo
& <"bar"> | f
(1 row)
... and cited the spec:
6.7 <XML cast specification>: Syntax Rules
...
15 e)
* i) If the type designator of SQLT is DATE, then let XT be xs:date.
* ii) If the type designator of SQLT is TIME WITH TIME ZONE, then let
XT be xs:time.
* iii) If the type designator of SQLT is TIME WITHOUT TIME ZONE, then
let XT be xs:time.
* iv) If the type designator of SQLT is TIMESTAMP WITH TIME ZONE, then
let XT be xs:dateTime.
* v) If the type designator of SQLT is TIMESTAMP WITHOUT TIME ZONE,
then let XT be xs:dateTime.
and since you didn't reply, I assumed I had already addressed your
comments. But now I see it was not the case.
> To make that more concrete, the patch says:
>
> + Another option to convert character strings into xml is the
> function <function>xmlcast</function>,
> + which is designed to cast SQL data types into <type>xml</type>,
> and vice versa.
>
> But while it explains the behavior of this option, it does not explain
> how the behavior is the same or different from other options, or why.
I'm not entirely sure what you mean by "other options". Similar
alternatives are
* CAST: simply treats the input as a textual XML value and does not
perform type-aware formatting.
* XMLTABLE: extracts typed values from an XML document using
XQuery/XPath expressions, but does not itself convert SQL types into XML.
SELECT xmlcast('2 years 1 day'::interval AS xml);
xmlcast
---------
P2Y1D
(1 row)
SELECT cast('2 years 1 day'::interval AS xml);
ERROR: cannot cast type interval to xml
SELECT xmlcast(now() AS xml);
xmlcast
---------------------------------
2025-05-19T20:16:47.58815+02:00
(1 row)
SELECT cast(now() AS xml);
ERROR: cannot cast type timestamp with time zone to xml
Is it what you meant?
> In the comments it says:
>
> + /* These data types must be converted to their ISO 8601 representations */
>
> To me this just begs the question "says who?".
Says the SQL/XML:2023 standard :)
SQL/XML:2023 (ISO/IEC 9075-14:2023) - “General Rules” of §6.7.3 (d.ii.1
and d.ii.2):
If SD is a year-month interval type, then let XSBT be the XQuery simple
type xs:yearMonthDuration.
If SD is a day-time interval type, then let XSBT be the XQuery simple
type xs:dayTimeDuration.
... and since xs:yearMonthDuration and xs:dayTimeDuration are subsets of
xs:duration, and the lexical representation of this type is ISO 8601, I
inferred that it is the expected behaviour.
> I think there should be
> a bunch of comments in this referencing whatever document specifies
> the behavior of XMLCAST, so that someone who is good at reading
> specification documents (not me) can compare the implementation with
> the spec and see if they agree with the decisions that were made.
I re-wrote the commit message and some the code comments to try to make
things clearer.
> + default:
> + *op->resvalue = PointerGetDatum(DatumGetTextP(value));
> + break;
>
> This doesn't seem very safe at all. If we know what type OIDs we
> intend this to handle, then we could list them out explicitly as is
> already done for TEXTOID, VARCHAROID, etc. If we don't, then why
> should we believe that it's a data type for which DatumGetTextP will
> produce a non-garbage return value? Maybe there's an answer to that
> question, but there's no comment spelling it out; or maybe it's
> actually just broken.
Given that XMLCast converts values between SQL and XML and vice versa,
my rationale was that if the target type is not a text type (such as
TEXTOID, VARCHAROID, or NAMEOID), then the cast operand must be of type
xml, which makes this default: safe.
SELECT xmlcast('2 years 1 day'::interval AS interval);
ERROR: cannot cast from 'interval' to 'interval'
SELECT xmlcast(now() AS timestamp);
ERROR: cannot cast from 'timestamp with time zone' to 'timestamp
without time zone'
see "static Node *transformXmlCast(ParseState *pstate, XmlCast *xc)" in
parse_target.c
But I can see it looks unsafe. Do you have something like this in mind?
case INT2OID:
case INT4OID:
case INT8OID:
case NUMERICOID:
case FLOAT4OID:
case FLOAT8OID:
case BOOLOID:
case TIMESTAMPOID:
case TIMESTAMPTZOID:
case TIMEOID:
case TIMETZOID:
case DATEOID:
case BYTEAOID:
case INTERVALOID:
*op->resvalue = PointerGetDatum(DatumGetTextP(value));
break;
default:
elog(ERROR, "unsupported target data type for XMLCast");
}
Thanks for the review. Much appreciated.
v10 attached.
Best, Jim
1 - https://wiki.postgresql.org/wiki/Todo
2 - https://www.postgresql.org/message-id/998465cb-2fda-4497-8194-87da56748186%40uni-muenster.de
Attachment | Content-Type | Size |
---|---|---|
v10-0001-Add-XMLCast-function-SQL-XML-X025.patch | text/x-patch | 124.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-05-19 23:38:47 | Re: Regression in statement locations |
Previous Message | Dmitry Koval | 2025-05-19 22:35:45 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |