Re: WIP - xmlvalidate implementation from TODO list

From: Marcos Magueta <maguetamarcos(at)gmail(dot)com>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: WIP - xmlvalidate implementation from TODO list
Date: 2025-12-19 03:25:51
Message-ID: CAN3aFCfvVgXr77o=dB_E2kSCY+EgckSQbSBdd_N9n-LauWuQLw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello again!

I took some time to actually finish this feature. I think the answers
for the previous questions are now clearer. I checked the
initialization and the protections are indeed in place since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states:

> Prevent access to external files/URLs via XML entity references.

> xml_parse() would attempt to fetch external files or URLs as needed to
> resolve DTD and entity references in an XML value, thus allowing
> unprivileged database users to attempt to fetch data with the privileges
> of the database server. While the external data wouldn't get returned
> directly to the user, portions of it could be exposed in error messages
> if the data didn't parse as valid XML; and in any case the mere ability
> to check existence of a file might be useful to an attacker.
>
> The ideal solution to this would still allow fetching of references that
> are listed in the host system's XML catalogs, so that documents can be
> validated according to installed DTDs. However, doing that with the
> available libxml2 APIs appears complex and error-prone, so we're not going
> to risk it in a security patch that necessarily hasn't gotten wide review.
> So this patch merely shuts off all access, causing any external fetch to
> silently expand to an empty string. A future patch may improve this.

With that, the obvious affordance on the xmlvalidate implementation
was to not rely on external schema sources on the host
catalog. Therefore the implementation relies solely on expressions
that necessarily evaluate to a schema in plain text.

I added the requested documentation and a bunch of tests for each
scenario. I would appreciate another round of reviews whenever someone
has the time and patience.

At last, to nourish the curiosity: I had issues with make check, as
stated above on the e-mail thread. These got resolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of course did not commit
such, but more people I know have had the very same issue while
relying on immutable package managers.

Attachment Content-Type Size
0001-full-xmlvalidate-text-schema-implementation.patch application/octet-stream 33.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-12-19 03:38:24 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Zhijie Hou (Fujitsu) 2025-12-19 03:19:15 RE: Assertion failure in SnapBuildInitialSnapshot()