Re: Death by regexp_replace

From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Death by regexp_replace
Date: 2016-01-15 16:59:46
Message-ID: CADbMkNOn=qSqRovK8k-URzPW18JsLEV-c_WdNN2t0x4enJeCEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Fri, Jan 15, 2016 at 4:39 PM, Benedikt Grundmann <
bgrundmann(at)janestreet(dot)com> wrote:

>
> On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Kevin Grittner <kgrittn(at)gmail(dot)com> writes:
>> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
>> >> practically no constraint there, leading to having to consider O(N^2)
>> >> or more possibilities.)
>>
>> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
>> > but it seems to be in an endless loop until you do that.
>>
>> A bit of further experimentation suggests the runtime growth is actually
>> more like O(2^N). It will terminate in a reasonable amount of time if the
>> input string is about half as long as the given example.
>>
>> The problem is that so far as the DFA engine is concerned, the pattern
>> substring '(,*\1)+' can match almost anything at all, because it's
>> equivalent to '(,*[^,]+)+' which is easily seen to match any string
>> whatever that's got at least one non-comma. So, for each possible match
>> to the substring '([^,]+)', of which there are lots, it has to consider
>> every possible way of breaking up all the rest of the string into one or
>> more substrings. The vast majority of those ways will fail when the
>> backref match is checked, but there's no way to realize it before that.
>>
>> To be clear I'm perfectly happy with that query taking forever (I didn't
> write it ;-)). The only thing I was unhappy about was that
> pg_cancel/terminate_backend didn't work. If that is fixed great.
>
>
>> regards, tom lane
>>
>
>
Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
it looks like you haven't released at least the link on this page for 9.2

http://yum.postgresql.org/repopackages.php

says 7 in the filename which is certainly not 14 ;-)

http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm

Is that expected?

Thanks,

Bene

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-01-15 17:01:14 Re: pglogical - logical replication contrib module
Previous Message Stephen Frost 2016-01-15 16:53:20 Re: Multi-tenancy with RLS

Browse pgsql-www by date

  From Date Subject
Next Message Tom Lane 2016-01-15 17:17:50 package links on website (was Re: [HACKERS] Death by regexp_replace)
Previous Message Benedikt Grundmann 2016-01-15 16:39:18 Re: Death by regexp_replace