Re: BUG #15420: Server crash. Segmentation fault when parsing xml file

From: Sergey Mirvoda <sergey(at)mirvoda(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Андрей Бородин <borodin(at)octonica(dot)com>, michael(at)paquier(dot)xyz, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Date: 2018-10-04 14:11:47
Message-ID: CALkWArjA5ApwXTnWWGMSmw6CFUaaTWHiL5gmJuMZXsMsb0tqeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

чт, 4 окт. 2018, 19:03 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:

>
>
> čt 4. 10. 2018 v 13:47 odesílatel Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> napsal:
>
>>
>>
>> čt 4. 10. 2018 v 13:43 odesílatel Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
>> napsal:
>>
>>>
>>>
>>> 4 окт. 2018 г., в 16:38, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>>> написал(а):
>>>
>>>
>>>
>>>
>>> Actually we found this error in very fresh intatallation of Ubuntu 16.04
>>>> and postgres 10.5
>>>> After that we checked every configuration we have.
>>>> And only postgres 9.4 works as expected.
>>>>
>>>
>>> This issue is related to libxml2 limits - and it cannot to work with
>>> modern libxml2 libraries.
>>>
>>> Yes, root cause is inside libxml2 code.
>>>
>>> Can we protect postmaster from crashing from libxml2 error? There is a
>>> bunch of PG_TRY there, but it does not help.
>>>
>>
>> Unfortunately, no. You cannot to handle crash. PostgreSQL doesn't start
>> separate process for libxml2 calls, and fault there is fatal.
>>
>
> I played with it, and it looks on some problems with libxml2 and your
> specific document (maybe too much multibyte chars, .. I don't know)
>
> I imported 200MB long xml document with 1M items. So it has not sense to
> limit xml size of PostgreSQL side.
>
> It looks so your xml document hits some corner case of libxml2 where it is
> extremely memory expensive. What I can see, there is lot of long content
> inside attributes.
>
> Regards
>

Pavel, thank you for your interest.
It is definitely something inside this document.

Actually we loaded about 10k different documents like this one. About 10Gb
of content and crash is only on this one.

But every other parser we tried (.net, Java, python) handled this just
fine.

For now we ended with custom plpython function for parsing xml and this is
slow as hell.

This is looks like regression, pg 9.4 load this document without any
problem.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2018-10-04 14:31:00 Re: BUG #15420: Server crash. Segmentation fault when parsing xml file
Previous Message Pavel Stehule 2018-10-04 14:02:56 Re: BUG #15420: Server crash. Segmentation fault when parsing xml file