Re: Undocumented(?) limits on regexp functions

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tels <nospam-pg-abuse(at)bloodgate(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented(?) limits on regexp functions
Date: 2018-08-14 19:54:40
Message-ID: 3A779A46-C928-497A-8618-B13B551AEF9E@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Aug 14, 2018, at 10:01 AM, Tels <nospam-pg-abuse(at)bloodgate(dot)com> wrote:
>
> Moin Andrew,
>
> On Tue, August 14, 2018 9:16 am, Andrew Gierth wrote:
>>>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>
>>>> Should these limits:
>>
>>>> a) be removed
>>
>> Tom> Doubt it --- we could use the "huge" request variants, maybe, but
>> Tom> I wonder whether the engine could run fast enough that you'd want
>> Tom> to.
>>
>> I do wonder (albeit without evidence) whether the quadratic slowdown
>> problem I posted a patch for earlier was ignored for so long because
>> people just went "meh, regexps are slow" rather than wondering why a
>> trivial splitting of a 40kbyte string was taking more than a second.
>
> Pretty much this. :)
>
> First of all, thank you for working in this area, this is very welcome.
>
> We do use UTF-8 and we did notice that regexp are not actually the fastest
> around, albeit we did not (yet) run into the memory limit. Mostly, because
> the regexp_match* stuff we use is only used in places where the
> performance is not key and the input/output is small (albeit, now that I
> mention it, the quadratic behaviour might explain a few slowdowns in other
> cases I need to investigate).
>
> Anyway, in a few places we have functions that use a lot (> a dozend)
> regexps that are also moderate complex (e.g. span multiple lines). In
> these cases the performance was not really up to par, so I experimented
> and in the end rewrote the functions in plperl. Which fixed the
> performance, so we no longer had this issue.

+1. I have done something similar, though in C rather than plperl.

As for the length limit, I have only hit that in stress testing, not in
practice.

mark

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-08-14 20:02:00 Re: Pre-v11 appearances of the word "procedure" in v11 docs
Previous Message Bruce Momjian 2018-08-14 19:31:04 Re: Facility for detecting insecure object naming