Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Andrew Borodin <borodin(at)octonica(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Date: 2018-10-05 12:44:58
Message-ID: CAFj8pRBDcq=-3waVae98+KpoxDySbJOvLQ2yhCup4dgdCViJpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

pá 5. 10. 2018 v 14:09 odesílatel Sergey Mirvoda <sergey(at)mirvoda(dot)com>
napsal:

>
> On Fri, Oct 5, 2018 at 10:08 AM Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
> wrote:
>
>> >>>>> "Andrey" == Andrey Borodin <x4mmm(at)yandex-team(dot)ru> writes:
>>
>> >> You're sure about that libxml2 version? I can reproduce a crash on
>> >> 2.9.4, but have as yet failed to do so on 2.9.7 (fails with an error
>> >> message instead)
>>
>> Andrey> You are right, there was default 2.9.4 from OS, and 2.9.4 from
>> Andrey> brew was not used.
>>
>> Andrey> x4mmm-osx:pgsql x4mmm$ xmllint --version
>> Andrey> xmllint: using libxml version 20904
>>
>> I have a complete diagnosis of why it crashes on 2.9.4, and I can see
>> why it does not crash the same way on 2.9.7, but I would not bet
>> anything on 2.9.7 not having some comparable issue.
>>
>> What happens on 2.9.4 is this (this is all inside libxml2):
>>
>> - at some point when parsing an element tag, the code decides to raise
>> a fatal error and call xmlHaltParser
>>
>> - xmlHaltParser works by resetting the input buffer's "base" and "cur"
>> pointers to point to a literal "" in the code (thus, a null byte)
>>
>> - xmlParseStartTag2 detects that input->base has changed, and assumes
>> that this is because the buffer got reallocated; in the process of
>> dealing with this, it resets input->cur to input->base + cur where
>> "cur" is a local variable holding the previous offset in the buffer
>> (which is now of course nonsense, so input->cur points into the
>> weeds)
>>
>> - something later tries to access the byte at *input->cur and likely
>> crashes (depending on many random factors, including load addresses
>> of shared libraries and where in the buffer the original error was
>> detected)
>>
>> Between 2.9.4 and 2.9.7 xmlParseStartTag2 was changed to handle buffer
>> reallocations differently so it doesn't fail the same way (it no longer
>> tries to modify input->cur). But there are so many ways that this error
>> path can screw itself up that I honestly would not trust it for one
>> second.
>>
>> --
>> Andrew (irc:RhodiumToad)
>>
>
>
> Sorry for top posting and spelling, T9 and mobile gmail not very usable.
>
> Some notes.
>
> if i set xmloption to document
>
> this code works as expected
> postgres=# select d::xml from
> convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
> g(d);
> ....
> postgres=# select xml_is_well_formed(d) from
> convert_from(pg_read_binary_file('EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
> g(d);
> xml_is_well_formed
> --------------------
> t
> (1 строка)
>
> but all other XML functions still crashing server
>
> for example:
> postgres=# select xpath_exists('//СвЮЛ'::text,d::xml) from
> convert_from(pg_read_binary_file('egrul/EGRUL_FULL_2018-01-01_X.XML'),'windows-1251')
> g(d);
>

There are different parsing methods

xmlCtxtReadDoc versus xmlParseBalancedChunkMemory

The problem is with xmlParseBalancedChunkMemory

Regards

Pavel

> --
> --Regards, Sergey Mirvoda
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-10-05 13:11:43 BUG #15421: Error: LIKE pattern must not end with escape character
Previous Message Sergey Mirvoda 2018-10-05 12:36:18 Re: BUG #15420: Server crash. Segmentation fault when parsing xml file