Re: procost for to_tsvector

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: procost for to_tsvector
Date: 2015-03-11 16:26:04
Message-ID: 20150311162604.GL12445@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-03-11 12:07:20 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2015-03-11 14:40:16 +0000, Andrew Gierth wrote:
> >> ,but even without doing that, there's a strong
> >> argument that it should be increased to at least the order of 100.
>
> Nyet ... at least not without you actually making that argument, with
> numbers, rather than just handwaving. We use 100 for plpgsql and suchlike
> functions. I'd be OK with making it 10 just on general principles, but
> claiming that it's as expensive as a plpgsql function requires
> evidence.

I'll note that you proposed a higher cost than 10 years back ;):
http://www.postgresql.org/message-id/8971.1255891843@sss.pgh.pa.us

What you said back then makes sense to me:

On 2009-10-18 14:50:43 -0400, Tom Lane wrote:
> In another case I was looking at just now, it seems that to_tsquery()
> and to_tsvector() are noticeably slower than most other built-in
> functions, which is not surprising given the amount of mechanism that
> gets invoked inside them. It would be useful to tell the planner
> about that to discourage it from picking seqscan plans that involve
> repeated execution of these functions.

A trivial comparison shows with a simple plpgsql function:
CREATE FUNCTION a_simple_plpgsql_function(a text) RETURNS text LANGUAGE plpgsql AS $$BEGIN RETURN repeat(a, 3);END;$$;

SELECT a_simple_plpgsql_function('This is a long sentence in english. Or maybe not so long after all. But it includes a Metal Ümlaut. And parens: ()! Also a number: ' ||g.i)
FROM generate_series(1, 10000) g(i)
Time: 32.898 ms

and
SELECT to_tsvector('english',
'This is a long sentence in english. Or maybe not so
long after all. But it includes a Metal Ümlaut. And
parens: ()! Also a number: ' ||g.i)
FROM generate_series(1, 10000) g(i);
Time: 450.996 ms

Given that this is a short sentence and a simple text search
configuration a factor of 10 between them doesn't sound wrong. This is
obviously completely unscientific, but ...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-03-11 16:27:57 Re: One question about security label command
Previous Message Robert Haas 2015-03-11 16:15:32 Re: One question about security label command