Re: tsearch parser inefficiency if text includes urls or emails - new version

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Andres Freund" <andres(at)anarazel(dot)de>
Cc: <greg(at)2ndquadrant(dot)com>,<pgsql-hackers(at)postgresql(dot)org>, <oleg(at)sai(dot)msu(dot)su>, <teodor(at)sigaev(dot)ru>
Subject: Re: tsearch parser inefficiency if text includes urls or emails - new version
Date: 2009-12-08 22:07:04
Message-ID: 4B1E79A8020000250002D24A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> Perhaps it is some quirk of using 32 bit pointers on the 64 bit
> AMD CPU? (I'm looking forward to testing this today on a 64 bit
> build on an Intel CPU.)

The exact same test on 64 bit OS (SuSE Enterprise Server) on Intel
gave very different results. With 10 runs each of 200 iterations of
parsing the 10000 URLs, the patch Andres submitted ran 0.4% faster
than HEAD, and my attempt to improve on it ran 0.6% slower than
HEAD. I'll try to run the numbers to get the percentage chance that
a random distribution would have generated a spread as large as
either of those; but I think it's safe to say that the submitted
patch doesn't hurt there and that my attempt to improve on it was
misdirected. :-/

I would like to independently confirm the dramatic improvement
reported by Andres. Could I get a short snippet from the log which
was used for that, along with an indication of the size of the text
parsed in that test? (Since the old code looks like it might have
O(N^2) performance in some situations, while the patch changes that
to O(N), I might not be testing with a big enough N.)

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David P. Quigley 2009-12-08 22:42:55 Re: Adding support for SE-Linux security
Previous Message Tom Lane 2009-12-08 21:51:50 Re: Adding support for SE-Linux security