Re: Decomposing xml into table

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Thomas Kellerer <shammat(at)gmx(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Decomposing xml into table
Date: 2020-06-23 13:59:08
Message-ID: 5EF20AAC.1010006@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/23/20 08:57, Thomas Kellerer wrote:
> Surafel Temesgen schrieb am 23.06.2020 um 13:59:
>>> Did you try the xmltable function?
>>
>> yes i know it but i am proposing changing given xml data in to
>> relational form and insert it to desired table at once
> Well, xmltable() does change the XML data to a relational form and
> the result can directly be used to insert into a table
>
> insert into target_table (...)
> select ...
> from xmltable(...);

The use case that I imagine might be driving this would be where the XML
source is not deeply or elaborately nested, but is yuuge. In such a case,
PostgreSQL's XML handling and xmltable will not be doing beautiful things:

- the data coming from the frontend will have to be completely buffered
in backend memory, and then parsed as XML once by the XML data type
input routine, only for the purpose of confirming it's XML. The unparsed
form is what becomes the Datum value, which means

- xmltable gets to parse it a second time, again all in memory, and then
generate the set-returning function result tuples from it.

- as I last understood it [1], even the tuples generated as a result
get all piled up in a tuplestore before the next part of the (what
you would otherwise hope to call) "pipeline" can happen. (There may
be work on better pipelining that part.)

So I would say for that use case, it will be hard to do better than an
external process acting as a filter from XML in to COPY-formatted tuples
out.

The XML-processing library I'm most familiar with, Saxon, can do some
sophisticated analysis of an XML Query or XSLT transformation and
determine when it can be done while consuming the XML in streaming
mode rather than building a complete tree first. (The open-source
"community edition" doesn't have that trick, only the paid editions,
but they're otherwise compatible, so you can prototype stuff using
the community edition, and then drop in a paid version and poof, it
goes faster.)

On 06/23/20 08:25, Surafel Temesgen wrote:
> On Mon, Jun 22, 2020 at 10:13 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The big-picture question here, though, is why expend effort on XML
>> at all?
>> It seems like JSON is where it's at these days for that problem space.
>
> there are a legacy systems and I think xml is still popular

I had an interesting conversation about that at PGCon a year ago, with
someone who crystallized this idea better than I had at the time (but
may or may not want his name on it):

We tend to repeat a cycle of: a new technology is introduced, minimal
at first, then develops a good ecosystem of sophisticated tooling, then
looks complicated and gets replaced with something minimal that needs to
repeat the same process.

By this point, we're on to 3.x versions of XML Query, XPath, and XSLT,
very mature languages that can express sophisticated transformations
and optimize the daylights out of them.

JSON now has JSONPATH, which is coming along, and relearning the lessons
of XPath and XQuery, and by the time it has, there will be something else
that's appealing because it looks more minimal, and we'll be having the
"why expend effort on JSON at all?" conversation.

Regards,
-Chap

[1] https://www.postgresql.org/message-id/12389.1563746057%40sss.pgh.pa.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-06-23 16:22:12 Re: pg_resetwal --next-transaction-id may cause database failed to restart.
Previous Message Tom Lane 2020-06-23 13:54:48 Re: Building postgresql with higher major version of separate libpq package