| From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
| Subject: | Regex with > 32k different chars causes a backend crash |
| Date: | 2013-04-03 15:11:28 |
| Message-ID: | 515C46A0.3090002@vmware.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
While playing with Alexander's pg_trgm regexp patch, I noticed that the
regexp library trips an assertion (if enabled) or crashes, when passed
an input string that contains more than 32k different characters:
select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100,
35000) x) as nastyregex;
This is because it uses 'short' as the datatype to identify colors. When
it overflows, -32768 is used as index to the colordesc array, and you
get a crash. AFAICS this can't reliably be used for anything more
sinister than crashing the backend.
A regex with that many different colors is an extreme case, so I think
it's enough to turn the assertion in newcolor() into a run-time check,
and throw a "too many colors in regexp" error. Alternatively, we could
expand 'color' from short to int, but that would double the memory usage
of sane regexps with less different characters.
Thoughts?
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2013-04-03 15:21:27 | Re: Regex with > 32k different chars causes a backend crash |
| Previous Message | Tom Lane | 2013-04-03 14:59:09 | Re: Drastic performance loss in assert-enabled build in HEAD |