Skip site navigation (1) Skip section navigation (2)

Re: BUG #4697: to_tsvector hangs on input

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Peter Guarino <peterguarino(at)earthlink(dot)net>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4697: to_tsvector hangs on input
Date: 2009-03-06 21:32:10
Message-ID: 49B1965A.30705@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-bugs
Peter Guarino wrote:
> The following bug has been logged online:
> 
> Bug reference:      4697
> Logged by:          Peter Guarino
> Email address:      peterguarino(at)earthlink(dot)net
> PostgreSQL version: 8.3.3
> Operating system:   Suse 10.x
> Description:        to_tsvector hangs on input
> Details: 
> 
> Certain strings involving the @ character cause the to_tsvector function to
> hang and the postgres server process handling the connection to spin the cpu
> at 100%. Attempts to gracefully kill the server process are unsuccessful and
> a 'kill -9' becomes necessary. Here is an example of a such a string:
> 4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4@D4(at)D4
> @D4(at)D4@D4(at)D4@D4(at)D4@D2?C

Hmm, it looks like it's not actually completely hung, but there's a 
nasty O(n^2) recursion in the parser, making the parsing time 
exponentially longer as the string gets longer. If you make the string 
shorter, it will finish before you get tired of waiting, but will still 
take a long time.

When the parser sees the "@", it goes into "email state". In email 
state, it recurses, trying to find out if the string after the "@" is a 
valid hostname. That in turn goes into email state, recurses again and 
so forth, until you reach the end of the string. Then, the recursion 
unwinds back to the first @, moving on to the next character. At the 
next "@" the cycle repeats.

Since the recursion only wants to know if the string after "@" is a 
valid hostname, we can stop the recursion as soon as we find out that 
it's not. The attached patch does that.

Teodor, does this look OK to you?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: break-tsparser-recursion-1.patch
Description: text/x-diff (786 bytes)

In response to

Responses

pgsql-bugs by date

Next:From: Marinos YannikosDate: 2009-03-08 10:17:44
Subject: (some) Indexes ignored after long-running UPDATE and REINDEX at the same time (8.3.6)
Previous:From: Peter GuarinoDate: 2009-03-06 18:48:22
Subject: BUG #4697: to_tsvector hangs on input

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group