Re: [PoC] XMLCast (SQL/XML X025)

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: [PoC] XMLCast (SQL/XML X025)
Date: 2025-05-19 23:11:07
Message-ID: a2972522-c359-421d-a513-b63f8625dce8@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert

On 19.05.25 16:26, Robert Haas wrote:
> Well, this patch is now more than 10 months old, and it's still the
> case that nobody other than the author has said that they want this.
> Is it time to give up?
Not quite yet -- unless there is an expiration date that I am not aware
of :). If we decide we don't need XMLCast on Postgres after all, I'd
suggest to delete it from the todo list on the wiki [1] - I've already
added a link to this thread there.
> I still don't think it's very clear either from the patch or from the
> thread how this differs from existing facilities. As far as I can see,
> there are zero comments in the patch explaining the design decisions
> that it makes, and nothing in the commit message, the comments, or
> even the thread itself addressing my concern from my previous review:
> how is this consistent, or inconsistent, with what we do in other
> similar cases, and why?

I guess I misunderstood your message last year. I was under the
impression that you were only concerned about the handling of intervals,
which I replied here [2]. Basically I said that XMLTable is doing pretty
much the same ...

WITH j (val) AS (
 SELECT
  '<foo>
    <interval>P1Y2M3DT4H5M6S</interval>
    <timestamp>2002-05-30T09:30:10</timestamp>
    <integer>42</integer>
    <numeric>-42.73</numeric>
    <text>foo &amp; &lt;&quot;bar&quot;&gt;</text>
    <boolean>false</boolean>
  </foo>'::xml
)
SELECT a, b, c, d, e, f
FROM j,
  XMLTABLE(
    '/foo'
    PASSING val
    COLUMNS
      a interval PATH 'interval',
      b timestamp PATH 'timestamp',
      c integer PATH 'integer',
      d numeric PATH 'numeric',
      e text PATH 'text',
      f boolean PATH 'boolean');
               a               |          b          | c  |   d   
|       e       | f
-------------------------------+---------------------+----+--------+---------------+---
 1 year 2 mons 3 days 04:05:06 | 2002-05-30 09:30:10 | 42 | -42.73 | foo
& <"bar"> | f
(1 row)

... and cited the spec:

6.7 <XML cast specification>: Syntax Rules
...
15 e)
 * i)   If the type designator of SQLT is DATE, then let XT be xs:date.
 * ii)  If the type designator of SQLT is TIME WITH TIME ZONE, then let
XT be xs:time.
 * iii) If the type designator of SQLT is TIME WITHOUT TIME ZONE, then
let XT be xs:time.
 * iv)  If the type designator of SQLT is TIMESTAMP WITH TIME ZONE, then
let XT be xs:dateTime.
 * v)   If the type designator of SQLT is TIMESTAMP WITHOUT TIME ZONE,
then let XT be xs:dateTime.

and since you didn't reply, I assumed I had already addressed your
comments. But now I see it was not the case.

> To make that more concrete, the patch says:
>
> + Another option to convert character strings into xml is the
> function <function>xmlcast</function>,
> + which is designed to cast SQL data types into <type>xml</type>,
> and vice versa.
>
> But while it explains the behavior of this option, it does not explain
> how the behavior is the same or different from other options, or why.

I'm not entirely sure what you mean by "other options". Similar
alternatives are

* CAST: simply treats the input as a textual XML value and does not
perform type-aware formatting.
* XMLTABLE: extracts typed values from an XML document using
XQuery/XPath expressions, but does not itself convert SQL types into XML.

SELECT xmlcast('2 years 1 day'::interval AS xml);
 xmlcast
---------
 P2Y1D
(1 row)

SELECT cast('2 years 1 day'::interval AS xml);
ERROR:  cannot cast type interval to xml

SELECT xmlcast(now() AS xml);
             xmlcast             
---------------------------------
 2025-05-19T20:16:47.58815+02:00
(1 row)

SELECT cast(now() AS xml);
ERROR:  cannot cast type timestamp with time zone to xml

Is it what you meant?

> In the comments it says:
>
> + /* These data types must be converted to their ISO 8601 representations */
>
> To me this just begs the question "says who?".

Says the SQL/XML:2023 standard :)

SQL/XML:2023 (ISO/IEC 9075-14:2023) - “General Rules” of §6.7.3 (d.ii.1
and d.ii.2):

If SD is a year-month interval type, then let XSBT be the XQuery simple
type xs:yearMonthDuration.
If SD is a day-time interval type, then let XSBT be the XQuery simple
type xs:dayTimeDuration.

... and since xs:yearMonthDuration and xs:dayTimeDuration are subsets of
xs:duration, and the lexical representation of this type is ISO 8601, I
inferred that it is the expected behaviour.

> I think there should be
> a bunch of comments in this referencing whatever document specifies
> the behavior of XMLCAST, so that someone who is good at reading
> specification documents (not me) can compare the implementation with
> the spec and see if they agree with the decisions that were made.
I re-wrote the commit message and some the code comments to try to make
things clearer.
> + default:
> + *op->resvalue = PointerGetDatum(DatumGetTextP(value));
> + break;
>
> This doesn't seem very safe at all. If we know what type OIDs we
> intend this to handle, then we could list them out explicitly as is
> already done for TEXTOID, VARCHAROID, etc. If we don't, then why
> should we believe that it's a data type for which DatumGetTextP will
> produce a non-garbage return value? Maybe there's an answer to that
> question, but there's no comment spelling it out; or maybe it's
> actually just broken.

Given that XMLCast converts values between SQL and XML and vice versa,
my rationale was that if the target type is not a text type (such as
TEXTOID, VARCHAROID, or NAMEOID), then the cast operand must be of type
xml, which makes this default: safe.

SELECT xmlcast('2 years 1 day'::interval AS interval);
ERROR:  cannot cast from 'interval' to 'interval'

SELECT xmlcast(now() AS timestamp);
ERROR:  cannot cast from 'timestamp with time zone' to 'timestamp
without time zone'
               
see "static Node *transformXmlCast(ParseState *pstate, XmlCast *xc)" in
parse_target.c

But I can see it looks unsafe. Do you have something like this in mind?

case INT2OID:
case INT4OID:
case INT8OID:
case NUMERICOID:
case FLOAT4OID:
case FLOAT8OID:
case BOOLOID:
case TIMESTAMPOID:
case TIMESTAMPTZOID:
case TIMEOID:
case TIMETZOID:
case DATEOID:
case BYTEAOID:
case INTERVALOID:
    *op->resvalue = PointerGetDatum(DatumGetTextP(value));
    break;
default:
    elog(ERROR, "unsupported target data type for XMLCast");
}

Thanks for the review. Much appreciated.

v10 attached.

Best, Jim

1 - https://wiki.postgresql.org/wiki/Todo
2 - https://www.postgresql.org/message-id/998465cb-2fda-4497-8194-87da56748186%40uni-muenster.de

Attachment Content-Type Size
v10-0001-Add-XMLCast-function-SQL-XML-X025.patch text/x-patch 124.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-05-19 23:38:47 Re: Regression in statement locations
Previous Message Dmitry Koval 2025-05-19 22:35:45 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands