Re: Fix quadratic performance of regexp match/split functions

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix quadratic performance of regexp match/split functions
Date: 2018-08-15 12:21:45
Message-ID: 87h8jv4zue.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Patch take 2. Changes:

1. Remove cleanup function with retail pfree()s; this was added in
commit ae65ca312 (Aug 2007) to fix an actual memory leak, but obsoleted
by commit ff428cded (Feb 2008); since then, the pfrees were pointless
since all the freed objects were in a memory context that was
immediately destroyed.

2. Use presence of a conversion buffer as a flag rather than call
pg_database_encoding_max_length() everywhere.

3. Increase limit on number of matches to 134 million and provide an
error message when it is reached (rather than an ugly invalid memory
request error). This limit could be removed by using repalloc_huge, but
that should probably be paired with equivalent changes in RE_execute and
done in a separate patch.

4. Disuse size_t in favour of int for sizes that can't overflow an int,
to avoid any chance of signed/unsigned mixups.

5. Remove special-case "substring to end of string" logic in the
single-byte case; adding an end-of-string position to the end of the
matches array makes it unnecessary and there's no performance benefit.

6. Moar commentz.

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
qregex.patch text/x-patch 11.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-15 13:27:40 Re: Add a semicolon to query related to search_path
Previous Message Jonathan S. Katz 2018-08-15 11:54:32 Re: Stored procedures and out parameters