Re: Death by regexp_replace

From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Death by regexp_replace
Date: 2016-01-15 16:39:18
Message-ID: CADbMkNOQotkJ7paxcvTpny-t5iwxQswPfmtT5pG_98_iJS_cZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Kevin Grittner <kgrittn(at)gmail(dot)com> writes:
> > On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
> >> practically no constraint there, leading to having to consider O(N^2)
> >> or more possibilities.)
>
> > On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> > but it seems to be in an endless loop until you do that.
>
> A bit of further experimentation suggests the runtime growth is actually
> more like O(2^N). It will terminate in a reasonable amount of time if the
> input string is about half as long as the given example.
>
> The problem is that so far as the DFA engine is concerned, the pattern
> substring '(,*\1)+' can match almost anything at all, because it's
> equivalent to '(,*[^,]+)+' which is easily seen to match any string
> whatever that's got at least one non-comma. So, for each possible match
> to the substring '([^,]+)', of which there are lots, it has to consider
> every possible way of breaking up all the rest of the string into one or
> more substrings. The vast majority of those ways will fail when the
> backref match is checked, but there's no way to realize it before that.
>
> To be clear I'm perfectly happy with that query taking forever (I didn't
write it ;-)). The only thing I was unhappy about was that
pg_cancel/terminate_backend didn't work. If that is fixed great.

> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-01-15 16:45:29 Re: Multi-tenancy with RLS
Previous Message Stephen Frost 2016-01-15 16:38:21 Re: Multi-tenancy with RLS

Browse pgsql-www by date

  From Date Subject
Next Message Benedikt Grundmann 2016-01-15 16:59:46 Re: Death by regexp_replace
Previous Message Tom Lane 2016-01-15 16:26:29 Re: Death by regexp_replace