Re: tsearch2 seem very slow

From: "Ahmad Fajar" <gendowo(at)konphalindo(dot)or(dot)id>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: tsearch2 seem very slow
Date: 2005-09-23 16:40:05
Message-ID: SVONEixjnzi95HfqR77000001f5@ki-communication.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Oleg,

For single index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bank&indonesia');

analyze result:
----------------
"Index Scan using fti_idx on articles (cost=0.00..862.97 rows=420 width=51)
(actual time=0.067..183761.324 rows=46186 loops=1)"
" Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
"Total runtime: 183837.826 ms"

And for multicolumn index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bank&mega');

analyze result:
----------------
"Index Scan using articles_x1 on articles (cost=0.00..848.01 rows=410
width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
" Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
"Total runtime: 37933.757 ms"

The table structure is as mention on the first talk. If you wanna know how
much table in my database, it's about 100 tables or maybe more. Now I
develop the version 2 of my web application, you can take a look at:
http://www.mediatrac.net, so it will hold many datas. But the biggest table
is article's table. On develop this version 2 I just use half data of the
article's table (about 419804 rows). May be if I import all of the article's
table data it will have 1 million rows. The article's table grows rapidly,
about 100000 rows per-week. My developing database size is 28 GB (not real
database, coz I still develop the version 2 and I use half of the data for
play around). I just wanna to perform quick search (fulltext search) on my
article's table not other table. On version 1, the current running version I
use same hardware spesification as mention below, but there is no fulltext
search. So I develop the new version with new features, new interface and
include the fulltext search.

I do know, if the application finish, I must use powerfull hardware. But how
can I guarantee the application will run smooth, if I do fulltext search on
419804 rows in a table it took a long time to get the result.

Could you or friends in this maling-list help me....plz..plzz

Tsearch2 configuration:
-------------------------
I use default configuration, english stop word file as tsearch2 provide,
stem dictionary as default (coz I don't know how to configure and add new
data to stem dictionary) and I add some words to the english stop word file.

Postgresql configuration
-------------------------
max_connections = 32
shared_buffers = 32768
sort_mem = 8192
vacuum_mem = 65536
work_mem = 16384
maintenance_work_mem = 65536
max_fsm_pages = 30000
max_fsm_relations = 1000
max_files_per_process = 100000
checkpoint_segments = 15
effective_cache_size = 192000
random_page_cost = 2
geqo = true
geqo_threshold = 50
geqo_effort = 5
geqo_pool_size = 0
geqo_generations = 0
geqo_selection_bias = 2.0
from_collapse_limit = 10
join_collapse_limit = 15

OS configuration:
------------------
I use Redhat 4 AS, kernel 2.6.9-11
kernel.shmmax=1073741824
kernel.sem=250 32000 100 128
fs.aio-max-nr=5242880
the server I configure just only for postgresql, no other service is running
like: www, samba, ftp, email, firewall

hardware configuration:
------------------------
Motherboard ASUS P5GD1
Processor P4 3,2 GHz
Memory 2 GB DDR 400,
2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql
data and the partition is EXT3
1x80 GB EIDE 7200 RPM configure for system and home directory and the
partiton is EXT3

Did I miss something?

Regards,
ahmad fajar

-----Original Message-----
From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
Sent: Jumat, 23 September 2005 18:26
To: Ahmad Fajar
Cc: pgsql-performance(at)postgresql(dot)org
Subject: RE: [PERFORM] tsearch2 seem very slow

On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
> only because the result still in postgres cache, but how about in the
first
> run? I didn't dare, the values is un-acceptable. Because my table will
grows
> rapidly, it's about 100000 rows per-week. And the visitor will search
> anything that I don't know, whether it's the repeated search or new
search,
> or whether it's in postgres cache or not.

if you have enoush shared memory postgresql will keep index pages there.

>
> I just compare with http://www.postgresql.org, the search is quite fast,
and
> I don't know whether the site uses tsearch2 or something else. But as fas
as
> I know, if the rows reach >100 milion (I have try for 200 milion rows and
it
> seem very slow), even if don't use tsearch2, only use simple query like:
> select f1, f2 from table1 where f2='blabla',
> and f2 is indexes, my postgres still slow on the first time, about >10
sec.
> because of this I tried something brand new to fullfill my needs. I have
> used fti, and tsearch2 but still slow.
>
> I don't know what's going wrong with my postgres, what configuration must
I
> do to perform the query get fast result. Or must I use enterprisedb 2005
or
> pervasive postgres (both uses postgres), I don't know very much about
these
> two products.

you didn't show us your configuration (hardware,postgresql and tsearch2),
explain analyze of your queries, so we can't help you.
How big is your database, tsearch2 index size ?

>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> Sent: Jumat, 23 September 2005 14:36
> To: Ahmad Fajar
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] tsearch2 seem very slow
>
> Ahmad,
>
> how fast is repeated runs ? First time system could be very slow.
> Also, have you checked my page
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
> and some info about tsearch2 internals
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>
> Oleg
> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>
>> I have about 419804 rows in my article table. I have installed tsearch2
> and
>> its gist index correctly.
>>
>> My table structure is:
>>
>> CREATE TABLE tbarticles
>>
>> (
>>
>> articleid int4 NOT NULL,
>>
>> title varchar(250),
>>
>> mediaid int4,
>>
>> datee date,
>>
>> content text,
>>
>> contentvar text,
>>
>> mmcol float4 NOT NULL,
>>
>> sirkulasi float4,
>>
>> page varchar(10),
>>
>> tglisidata date,
>>
>> namapc varchar(12),
>>
>> usere varchar(12),
>>
>> file_pdf varchar(255),
>>
>> file_pdf2 varchar(50),
>>
>> kolom int4,
>>
>> size_jpeg int4,
>>
>> journalist varchar(120),
>>
>> ratebw float4,
>>
>> ratefc float4,
>>
>> fti tsvector,
>>
>> CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>
>> ) WITHOUT OIDS;
>>
>> Create index fti_idx1 on tbarticles using gist (fti);
>>
>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>
>>
>>
>> But when I search something like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>
>> It takes about 30 sec. I run explain analyze and the index is used
>> correctly.
>>
>>
>>
>> Then I try multi column index to filter by date, and my query something
>> like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
> '2002-01-01'
>> and datee <= current_date
>>
>> An it still run about 25 sec. I do run explain analyze and my multicolumn
>> index is used correctly.
>>
>> This is not acceptable if want to publish my website if the search took
> very
>> longer.
>>
>>
>>
>> I have run vacuum full analyze before doing such query. What going wrong
>> with my query?? Is there any way to make this faster?
>>
>> I have try to tune my postgres configuration, but it seem helpless. My
> linux
>> box is Redhat 4 AS, and
>>
>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure
> as
>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>
>>
>>
>> Please.help.help.
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Peacetree 2005-09-23 16:48:35 Re: [PERFORM] Releasing memory during External sorting?
Previous Message Pailloncy Jean-Gerard 2005-09-23 16:39:35 Re: Releasing memory during External sorting?