Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From: Sergey Mirvoda <sergey(at)mirvoda(dot)com>
To: andrew(at)tao11(dot)riddles(dot)org(dot)uk
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Андрей Бородин <borodin(at)octonica(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Date: 2018-10-05 12:08:48
Message-ID: CALkWAriUN-6GsYyURvAB5f5+HsDbb_bx1YgsXMjs0xsMvCd-xQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
wrote:

> >>>>> "Andrey" == Andrey Borodin <x4mmm(at)yandex-team(dot)ru> writes:
>
> >> You're sure about that libxml2 version? I can reproduce a crash on
> >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
> >> message instead)
>
> Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
> Andrey> brew was not used.
>
> Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
> Andrey> xmllint: using libxml version 20904
>
> I have a complete diagnosis of why it crashes on 2.9.4, and I can see
> why it does not crash the same way on 2.9.7, but I would not bet
> anything on 2.9.7 not having some comparable issue.
>
> What happens on 2.9.4 is this (this is all inside libxml2):
>
> - at some point when parsing an element tag, the code decides to raise
> a fatal error and call xmlHaltParser
>
> - xmlHaltParser works by resetting the input buffer's "base" and "cur"
> pointers to point to a literal "" in the code (thus, a null byte)
>
> - xmlParseStartTag2 detects that input->base has changed, and assumes
> that this is because the buffer got reallocated; in the process of
> dealing with this, it resets input->cur to input->base + cur where
> "cur" is a local variable holding the previous offset in the buffer
> (which is now of course nonsense, so input->cur points into the
> weeds)
>
> - something later tries to access the byte at *input->cur and likely
> crashes (depending on many random factors, including load addresses
> of shared libraries and where in the buffer the original error was
> detected)
>
> Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
> reallocations differently so it doesn't fail the same way (it no longer
> tries to modify input->cur). But there are so many ways that this error
> path can screw itself up that I honestly would not trust it for one
> second.
>
> --
> Andrew (irc:RhodiumToad)
>

Sorry for top posting and spelling, T9 and mobile gmail not very usable.

Some notes.

if i set xmloption to document

this code works as expected
postgres=# select d::xml from
convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
g(d);
....
postgres=# select xml_is_well_formed(d) from
convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
g(d);
xml_is_well_formed
--------------------
t
(1 строка)

but all other XML functions still crashing server

for example:
postgres=# select xpath_exists('//СвЮЛ'::text,d::xml) from
convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
g(d);

--
--Regards, Sergey Mirvoda

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2018-10-05 12:28:22 Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Previous Message Sergey Mirvoda 2018-10-05 12:03:17 Re: BUG #15420: Server crash. Segmentation fault when parsing xml file