Re: tsearch parser inefficiency if text includes urls or emails - new version

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Andres Freund" <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <greg(at)2ndquadrant(dot)com>,<oleg(at)sai(dot)msu(dot)su>, <teodor(at)sigaev(dot)ru>
Subject: Re: tsearch parser inefficiency if text includes urls or emails - new version
Date: 2009-12-10 17:01:05
Message-ID: 4B20D4F1020000250002D2F1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> wrote:

> I think you see no real benefit, because your strings are rather
> short - the documents I scanned when noticing the issue where
> rather long.

The document I used in the test which showed the regression was
672,585 characters, containing 10,000 URLs.

> A rather extreme/contrived example:

> postgres=# SELECT 1 FROM to_tsvector(array_to_string(ARRAY(SELECT
> 'andres(at)anarazel(dot)de http://www.postgresql.org/'::text FROM
> generate_series(1,
> 20000) g(i)), ' - '));

The most extreme of your examples uses a 979,996 character string,
which is less than 50% larger than my test. I am, however, able to
see the performance difference for this particular example, so I now
have something to work with. I'm seeing some odd behavior in terms
of when there is what sort of difference. Once I can categorize it
better, I'll follow up.

Thanks for the sample which shows the difference.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-12-10 17:07:09 Re: explain output infelicity in psql
Previous Message Ron Mayer 2009-12-10 16:44:16 Re: explain output infelicity in psql