Re: Tsearch2 performance on big database

From: Rick Jansen <rick(at)rockingstone(dot)nl>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Mike Rylander <mrylander(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Tsearch2 performance on big database
Date: 2005-03-22 16:05:55
Message-ID: 42404263.40604@rockingstone.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Oleg Bartunov wrote:
> Mike,
>
> no comments before Rick post tsearch configs and increased buffers !
> Union shouldn't be faster than (term1|term2).
> tsearch2 internals description might help you understanding tsearch2
> limitations.
> See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
> Also, don't miss my notes:
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>
> Oleg

Thanks Oleg, i've seen those pages before :) I've set shared_buffers to
45000 now (yes thats probably very much, isn't it?) and it already seems
a lot quicker.

How do I find out what my tsearch config is? I followed the intro
(http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and
applied it to our books table, thats all, didnt change anything else
about configs.

> On Tue, 22 Mar 2005, Mike Rylander wrote:
>> Ahh... I should have qualified my claim. I am creating a google-esqe
>> search interface and almost every query uses '&' as the term joiner.
>> 'AND' queries and one-term queries are orders of magnitude faster than
>> 'OR' queries, and fortunately are the expected default for most users.
>> (Think, "I typed in these words, therefore I want to match these
>> words"...) An interesting test may be to time multiple queries
>> independently, one for each search term, and see if the combined cost
>> is less than a single 'OR' search. If so, you could use UNION to join
>> the results.

Well I just asked my colleges and OR queries arent used by us anyway, so
I'll test for AND queries instead.

>> However, the example you originally gave ('terminology') should be
>> very fast. On a comparable query ("select count(value) from
>> metabib.full_rec where index_vector @@ to_tsquery('default','jane');")
>> I get 12ms.

ilab=# select count(*) from books where idxfti @@ to_tsquery('default',
'jane');
count
-------
4093
(1 row)
Time: 217395.820 ms

:(

ilab=# explain analyze select count(*) from books where idxfti @@
to_tsquery('default', 'jane');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16591.95..16591.95 rows=1 width=0) (actual
time=4634.931..4634.932 rows=1 loops=1)
-> Index Scan using idxfti_idx on books (cost=0.00..16581.69
rows=4102 width=0) (actual time=0.395..4631.454 rows=4093 loops=1)
Index Cond: (idxfti @@ '\'jane\''::tsquery)
Total runtime: 4635.023 ms
(4 rows)

Time: 4636.028 ms
ilab=#

>> 16G of RAM on a dedicated machine.
>>
>>
>> shared_buffers = 15000 # min 16, at least max_connections*2,
>> 8KB each
>> work_mem = 10240 # min 64, size in KB
>> maintenance_work_mem = 1000000 # min 1024, size in KB
>> # big m_w_m for loading data...
>>
>> random_page_cost = 2.5 # units are one sequential page fetch
>> cost
>> # fast drives, and tons of RAM
>>

Right.. well I'll try copying these settings, see how that works out,
thanks :)

Rick
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Stark 2005-03-22 16:19:40 Re: What about utility to calculate planner cost constants?
Previous Message Hannu Krosing 2005-03-22 15:59:16 Re: What needs to be done for real Partitioning?