Quick Links

Re: tsearch parser inefficiency if text includes urls or emails - new version

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Andres Freund" <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Cc:	<greg(at)2ndquadrant(dot)com>,<oleg(at)sai(dot)msu(dot)su>, <teodor(at)sigaev(dot)ru>
Subject:	Re: tsearch parser inefficiency if text includes urls or emails - new version
Date:	2009-12-10 17:01:05
Message-ID:	4B20D4F1020000250002D2F1@gw.wicourts.gov
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> wrote:

> I think you see no real benefit, because your strings are rather
> short - the documents I scanned when noticing the issue where
> rather long.

The document I used in the test which showed the regression was
672,585 characters, containing 10,000 URLs.

> A rather extreme/contrived example:

> postgres=# SELECT 1 FROM to_tsvector(array_to_string(ARRAY(SELECT
> 'andres(at)anarazel(dot)de http://www.postgresql.org/'::text FROM
> generate_series(1,
> 20000) g(i)), ' - '));

The most extreme of your examples uses a 979,996 character string,
which is less than 50% larger than my test. I am, however, able to
see the performance difference for this particular example, so I now
have something to work with. I'm seeing some odd behavior in terms
of when there is what sort of difference. Once I can categorize it
better, I'll follow up.

Thanks for the sample which shows the difference.

-Kevin

In response to

Re: tsearch parser inefficiency if text includes urls or emails - new version at 2009-12-09 18:06:45 from Andres Freund

Responses

Re: tsearch parser inefficiency if text includes urls or emails - new version at 2009-12-10 18:10:24 from Kevin Grittner

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2009-12-10 17:07:09	Re: explain output infelicity in psql
Previous Message	Ron Mayer	2009-12-10 16:44:16	Re: explain output infelicity in psql