Re: WIP - xmlvalidate implementation from TODO list

From: Marcos Magueta <maguetamarcos(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP - xmlvalidate implementation from TODO list
Date: 2026-01-10 05:26:11
Message-ID: CAN3aFCe_cBshj0rb7J8yoT+fRHOBOZmk-m8V7DMLDe0ZjSgjcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello there, Postgres hackers!

Since we last talked I tried to actually implement a catalog for XML
schemas. It was a bit challenging personally.

I particularly had an issue with SYSCACHE so I can search the xmlschema
entries. I mostly tried to follow examples around the codebase (specially
when it comes to security, which I mostly mirrored from sequences), so if I
am misusing something, that's likely the problem, so please point that out.
Some highlights of the changes are:

What's implemented so far:
- New pg_xmlschema catalog (OIDs 6434 to 6438) for storing XML schema
definitions
- The validation function itself: XMLVALIDATE(DOCUMENT xml_expr ACCORDING
TO XMLSCHEMA xml_schema_name)
- DDL support: CREATE XMLSCHEMA, DROP XMLSCHEMA, ALTER XMLSCHEMA (with IF
NOT EXISTS, RENAME, SET SCHEMA, OWNER TO)
- Two syscaches (XMLSCHEMAOID and XMLSCHEMANAMENSP) for lookups during
parsing and execution
- Dependency tracking so schemas can't be dropped while views/rules
reference them (I think I even added a test for it, but it's worth checking
more closely)
- Integration with PostgreSQL's object system (event triggers, security
labels, permissions) -- I was mostly following what was in place already
and adding what seemed necessary
- Grants and role permissions

The flow is the following: schemas are registered via CREATE XMLSCHEMA,
then at parse time the schema name gets resolved to an OID (with
dependencies recorded), and finally at execution time the schema is fetched
by OID and used for validation. Please note that I only implemented
DOCUMENT mode for now, as it seemed like the most common use case and kept
things simpler. CONTENT mode by the standard seems to require another
keyword to make things useful (ELEMENT), which is a bit problematic since
it's a fairly used word to conflict on any database (it broke on the
current tests when I ran already, so...). With the ELEMENT we could do
things like: XMLVALIDATE(CONTENT '<author>C. J. Date</author>' ACCORDING TO
XMLSCHEMA book ELEMENT author), which is fairly powerful if you want to
validate just parts of the xml document, but again, it's scrapped from this
version because I wanted to pull my hair out dealing with it.

Following the SQL/XML standard, XMLVALIDATE returns the validated XML, not
a boolean anymore, and raises ERROR on validation failure. So now we can do
things like INSERT INTO r VALUES (XMLVALIDATE(...)) and other cool
compositions.

The documentation is still tracking the old xmlvalidate behavior and
nothing about the catalog changes.I can fix all that before submission if
the overall approach looks reasonable.

Permissions are a bit trickier, I thought of just relying on USAGE, since
the question for an XML validation is "Can I use this XML schema in a
call?". I think FDW works like that currently. I tried adding a bunch of
examples on xml.sql.

One change I specifically wanted feedback on are the changes on
parse_expr.c. Since I need to resolve the name of the schema, I thought
doing so at parsing time was the correct approach. In there I extract
schema name before the ResTarget loop runs, resolve it to OID using
get_xmlschema_oid(), save the string form in newx->name for deparsing (so
views can be displayed correctly) and finally clear x->named_args = NIL so
the ResTarget loop below has nothing to process. This approach reuses the
existing named_args field as temporary storage during parsing, then
converts it properly during transformation. It felt strange, but it avoids
adding another field to the XmlExpr structure just for XMLVALIDATE.

Also, I made the variables that get modified inside of the PG_TRY block
volatile as it seems to be a pattern likely to force the compiler to do
memory access instead of relying on registers during jumps.

I'm attaching the patch for another round of review (please excuse the lack
of the linter, I tried running but had some issues). Any feedback would be
really appreciated, this was quite an adventure.

PS: I noticed now that I haven't copied the changes over to xml1.out nor
xml2.out. But This is just for the feature validation round, so I can fix
that later.

Thanks for taking a look!

Attachment Content-Type Size
xmlschema_xmlvalidate.patch application/octet-stream 233.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul A Jungwirth 2026-01-10 06:16:26 Re: SQL:2011 Application Time Update & Delete
Previous Message zengman 2026-01-10 04:59:48 [PATCH] Fix minor issues in astreamer_zstd.c