Re: WIP - xmlvalidate implementation from TODO list

From: Marcos Magueta <maguetamarcos(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP - xmlvalidate implementation from TODO list
Date: 2026-01-21 20:44:27
Message-ID: CAN3aFCcUFLbdVBoL6c2bMh4r5P9EnXM9eBsX8+ZyER7YBSDUtA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey Jim,

> Any particular reason for that? If not, take a look at other options,
e.g. a_expr
No particular reason apart from it being simpler since I didn't need to
invoke an execution at the cmd. Changed it now.

> Why did you choose text over xml for schemadata?
My original thought was that XML schemas require additional validation in
contrast to normal XML, but it being additive, we would have redundant
checks. But in reconsideration, perhaps keeping the field with an XML type
is more intuitive for anyone introspecting over the catalog. Also applied
the change on the latest version of the patch.

Just a question about something I am very curious about the previous patch
I sent:
I noticed DefineXmlSchema() calls IsThereXmlSchemaInNamespace() right after
XmlSchemaCreate() returns a valid OID. Since XmlSchemaCreate() already
inserted the tuple into the catalog (via CatalogTupleInsert at
pg_xmlschema.c:166), wouldn't SearchSysCacheExists2() find it and always
throw "already exists"? We all tested the original code and it worked fine,
so I'm missing something about syscache visibility or timing; that was an
early function I did to check for duplicates that ended up in the wrong
place. I removed the call (and function) as I judged it to be redundant
(the duplicate check already happens inside XmlSchemaCreate()), but is
there something subtle about intra-command visibility I'm not
understanding? If anyone knows, please let me know.

I tried to split the patch into multiple, as recommended, but there might
still exist some overlaps when it comes to the division of the
implementation of XMLVALIDATE and XMLSCHEMA. I put the docs and tests on
the last patch, as I had issues amending.

Also, I added tab completion on psql and fixed pg_dump.

Regards, Marcos Magueta

Em qua., 14 de jan. de 2026 às 07:10, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
escreveu:

>
>
> On 14/01/2026 02:23, Marcos Magueta wrote:
> > Please follow the updated version attached.
>
> A few comments:
>
> == grammar ==
>
> In gram.y you restricted CREATE XMLSCHEMA to Sconst:
>
> DO $$
> DECLARE xsd text := '<foo></foo>';
> BEGIN
> CREATE XMLSCHEMA person_schema AS xsd;
> END $$;
> ERROR: syntax error at or near "xsd"
> LINE 4: CREATE XMLSCHEMA person_schema AS xsd;
>
> Any particular reason for that? If not, take a look at other options,
> e.g. a_expr
>
> == pg_xmlschema ==
>
> Why did you choose text over xml for schemadata?
>
> postgres=# \d pg_xmlschema
> Table "pg_catalog.pg_xmlschema"
> Column | Type | Collation | Nullable | Default
> -----------------+-----------+-----------+----------+---------
> oid | oid | | not null |
> schemaname | name | | not null |
> schemanamespace | oid | | not null |
> schemaowner | oid | | not null |
> schemadata | text | C | not null |
> schemaacl | aclitem[] | | |
> Indexes:
> "pg_xmlschema_oid_index" PRIMARY KEY, btree (oid)
> "pg_xmlschema_name_nsp_index" UNIQUE CONSTRAINT, btree (schemaname,
> schemanamespace)
>
>
> == psql command to display and list xml schemas ==
>
> Not a requirement for this patch (specially not at the current stage),
> but you should add it to your TODO list.
>
> \dz
> \dz foo
> \dz+ foo
>
> * z here is just an example
>
> == tab completion ==
>
> CREATE <TAB> should suggest XMLSCHEMA and CREATE XML<TAB> should
> autocomplete CREATE XMLSCHEMA. The same applies for DROP XMLSCHEMA [IF
> EXISTS], where it should additionally list the available schemas after
> DROP XMLSCHEMA <TAB>.
>
> == white-space warnings ==
>
> The patch does not apply cleanly:
>
>
> /home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:594:
> indent with spaces.
> Oid schemanamespace,
>
> /home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:595:
> indent with spaces.
> Oid schemaowner,
>
> /home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:596:
> indent with spaces.
> const char *schemadata,
>
> /home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:597:
> indent with spaces.
> bool if_not_exists,
>
> /home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:598:
> indent with spaces.
> bool quiet)
> warning: squelched 139 whitespace errors
> warning: 144 lines add whitespace errors.
>
> == file naming ==
>
> Your patch suggests that it is part of a patch set, from which 0001 is
> missing. In case you meant a version 2 of the previous patch, a better
> format would be
>
> v2-0001-xmlschema-catalog-and-xmlvalidate.patch
>
> which can be generated with
>
> $ git format-patch -1 -v2
>
> == xml_1.out not updated ==
>
> After every change in xml.sql you must create an equivalent file for a
> postgres compiled without --with-libxml, and put the changes in
> xml_1.out.[1]
>
> == corrupt pg_dump ==
>
> I understand we agreed to work on XMLVALIDATE only after CREATE
> XMLSCHEMA is settled, but since the code is partially already there, you
> might wanna take a look at pg_dump. It is not serialising the CREATE
> XMLSCHEMA statements:
>
> $ /usr/local/postgres-dev/bin/psql postgres
> psql (19devel)
> Type "help" for help.
>
> postgres=# CREATE XMLSCHEMA person_schema AS '<?xml version="1.0"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
> <xs:element name="person">
> <xs:complexType>
> <xs:sequence>
> <xs:element name="name" type="xs:string"/>
> <xs:element name="age" type="xs:integer"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>';
> CREATE XMLSCHEMA
> postgres=# CREATE VIEW v AS
> SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml
> ACCORDING TO XMLSCHEMA person_schema);
> CREATE VIEW
> postgres=# \q
>
> $ /usr/local/postgres-dev/bin/pg_dump postgres
> --
> -- PostgreSQL database dump
> --
>
> \restrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
>
> -- Dumped from database version 19devel
> -- Dumped by pg_dump version 19devel
>
> SET statement_timeout = 0;
> SET lock_timeout = 0;
> SET idle_in_transaction_session_timeout = 0;
> SET transaction_timeout = 0;
> SET client_encoding = 'UTF8';
> SET standard_conforming_strings = on;
> SELECT pg_catalog.set_config('search_path', '', false);
> SET check_function_bodies = false;
> SET xmloption = content;
> SET client_min_messages = warning;
> SET row_security = off;
>
> --
> -- Name: test_xmlschema_ns; Type: SCHEMA; Schema: -; Owner: jim
> --
>
> CREATE SCHEMA test_xmlschema_ns;
>
>
> ALTER SCHEMA test_xmlschema_ns OWNER TO jim;
>
> --
> -- Name: v; Type: VIEW; Schema: public; Owner: jim
> --
>
> CREATE VIEW public.v AS
> SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml ACCORDING TO XMLSCHEMA
> person_schema) AS "xmlvalidate";
>
>
> ALTER VIEW public.v OWNER TO jim;
>
> --
> -- PostgreSQL database dump complete
> --
>
> \unrestrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd
>
> Take a look at pg_dump.c. You might need a new function, e.g.
> dumpXmlSchemas(Archive *fout, const SchemaInfo *schemaInfo)
>
> == patch structure ==
>
> To make the review a bit easier, I suggest to split this patch into a
> patch set with **at least 4** smaller patches - the more seasoned
> hackers here might correct me if I am wrong. For instance:
>
> 0001 - CREATE XMLSCHEMA (code + tests + documentation)
> 0002 - pg_dump changes to output CREATE XMLSCHEMA
> 0003 - psql tab completion + new command to display and list xml schemas
> 0004 - XMLVALIDATE (code + tests + documentation)
>
>
> Thanks for working on this!
>
> Best, Jim
>
> [1] https://cirrus-ci.com/task/4872456290172928?logs=check_world#L126
>

Attachment Content-Type Size
0004-Add-XMLVALIDATE-function-for-XML-schema-validation.patch application/octet-stream 11.0 KB
0003-Add-psql-tab-completion-for-XMLSCHEMA-commands.patch application/octet-stream 2.4 KB
0002-Add-pg_dump-support-for-XMLSCHEMA-objects.patch application/octet-stream 7.5 KB
0001-Add-CREATE-ALTER-DROP-XMLSCHEMA-DDL-commands.patch application/octet-stream 43.4 KB
0005-Add-tests-and-documentation-for-XMLSCHEMA-feature.patch application/octet-stream 214.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2026-01-21 20:49:59 Re: Speed up COPY FROM text/CSV parsing using SIMD
Previous Message Ilia Evdokimov 2026-01-21 20:44:08 Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types