Using XML_PARSE_HUGE in operations on xml fields?

From: patrick mc allister <pma(at)rdorte(dot)org>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Using XML_PARSE_HUGE in operations on xml fields?
Date: 2014-03-04 15:54:21
Message-ID: 20140304155421.GM23803@rdorte.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I ran into trouble with an xpath expression on a large XML file:

SELECT id, xpath('//tei:div/descendant::tei:head/text()', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1;

returns:

ERROR: could not parse XML document
DETAIL: line 491482: internal error: Huge input lookup
स्थापितः।। </p>
^
line 491482: Extra content at the end of the document
स्थापितः।। </p>
^

I think this is an error that comes from the libxml2 library, since
pretty much the same thing happens with xmllint if you pass the
`--memory' option.

This is the libxml2 documentation which is relevant here, I think:

"""
#define XML_MAX_DICTIONARY_LIMIT

Maximum size allowed by the parser for a dictionary by default This is
not a limitation of the parser but a safety boundary feature, use
XML_PARSE_HUGE option to override it. Introduced in 2.9.0
"""

(see http://xmlsoft.org/html/libxml-parserInternals.html#XML_MAX_LOOKUP_LIMIT)

So I was wondering if and how I could set that XML_PARSE_HUGE option
in postgresql? I couldn't find anything in the docs or in this list's
archives.

If you want to replicate the problem quickly, with one approx 37MB xml
file and one smaller one:

createdb xmlpost
psql xmlpost

CREATE TABLE test (id integer, x xml);
\set content `curl http://sarit.indology.info/downloads/mahabharata-devanagari.xml`
INSERT INTO test (SELECT 1, (SELECT XMLPARSE (DOCUMENT :'content')));
\set content `curl http://sarit.indology.info/downloads/ratnakIrti-nibandhAvali.xml`
INSERT INTO test (SELECT 2, (SELECT XMLPARSE (DOCUMENT :'content')));

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=1; -- fails

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS stuff FROM test WHERE id=2; -- works: 13

Thanks for any hints,

--
patrick

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-03-04 16:00:13 Re: Role Inheritance Without Explicit Naming?
Previous Message Adrian Klaver 2014-03-04 15:42:01 Re: Segmentation fault