Re: tsearch2 seem very slow

From: "Ahmad Fajar" <gendowo(at)konphalindo(dot)or(dot)id>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: tsearch2 seem very slow
Date: 2005-09-25 18:14:46
Message-ID: SVONEF9ulRvv1BbGRCp00000239@ki-communication.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Oleg,

Sorry for my late. From the stat() function I got 1,5 million rows, although
I've added garbage words to the stop word file, there seem still have
garbage words. So I ask for my team to identify the garbage words and add to
stop words and I will update the articles after that. And about my articles,
it is quite big enough, the average length is about 2900 characters. And I
think, I have to tune tsearch2 and concentrate to the garbage words. The
most articles are indonesian language. What others way to tune the tsearch2
beside the garbage words?

Beside that, I still have problem, if I do a simple query like:
Select ids, keywords from dict where keywords='blabla' ('blabla' is a single
word); The table have 200 million rows, I have index the keywords field. On
the first time my query seem to slow to get the result, about 15-60 sec to
get the result. I use latest pgAdmin3 to test all queries. But if I repeat
the query I will get fast result. My question is why on the first time the
query seem to slow.

I try to cluster the table base on keyword index, but after 15 hours waiting
and it doesn't finish I stop clustering. Now I think I have to change the
file system for postgresql data. Do you have any idea what best for
postgresql, JFS or XFS? I will not try reiserfs, because there are some
rumors about reiserfs stability, although reiserfs is fast enough for
postgresql. And must I down grade my postgresql from version 8.0.3 to 7.4.8?

Regards,
ahmad fajar

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Oleg Bartunov
Sent: Saturday, September 24, 2005 1:08 PM
To: Ahmad Fajar
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] tsearch2 seem very slow

Ahmad,

what's about the number of unique words ? I mean stat() function.
Sometimes, it helps to identify garbage words.
How big is your articles (average length) ?

please, cut'n paste queries and output from psql ! How fast are
next queries ?

Oleg
On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> For single index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&indonesia');
>
> analyze result:
> ----------------
> "Index Scan using fti_idx on articles (cost=0.00..862.97 rows=420
width=51)
> (actual time=0.067..183761.324 rows=46186 loops=1)"
> " Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
> "Total runtime: 183837.826 ms"
>
> And for multicolumn index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&mega');
>
> analyze result:
> ----------------
> "Index Scan using articles_x1 on articles (cost=0.00..848.01 rows=410
> width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
> " Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
> ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
> "Total runtime: 37933.757 ms"
>
> The table structure is as mention on the first talk. If you wanna know how
> much table in my database, it's about 100 tables or maybe more. Now I
> develop the version 2 of my web application, you can take a look at:
> http://www.mediatrac.net, so it will hold many datas. But the biggest
table
> is article's table. On develop this version 2 I just use half data of the
> article's table (about 419804 rows). May be if I import all of the
article's
> table data it will have 1 million rows. The article's table grows rapidly,
> about 100000 rows per-week. My developing database size is 28 GB (not real
> database, coz I still develop the version 2 and I use half of the data for
> play around). I just wanna to perform quick search (fulltext search) on my
> article's table not other table. On version 1, the current running version
I
> use same hardware spesification as mention below, but there is no fulltext
> search. So I develop the new version with new features, new interface and
> include the fulltext search.
>
> I do know, if the application finish, I must use powerfull hardware. But
how
> can I guarantee the application will run smooth, if I do fulltext search
on
> 419804 rows in a table it took a long time to get the result.
>
> Could you or friends in this maling-list help me....plz..plzz
>
> Tsearch2 configuration:
> -------------------------
> I use default configuration, english stop word file as tsearch2 provide,
> stem dictionary as default (coz I don't know how to configure and add new
> data to stem dictionary) and I add some words to the english stop word
file.
>
> Postgresql configuration
> -------------------------
> max_connections = 32
> shared_buffers = 32768
> sort_mem = 8192
> vacuum_mem = 65536
> work_mem = 16384
> maintenance_work_mem = 65536
> max_fsm_pages = 30000
> max_fsm_relations = 1000
> max_files_per_process = 100000
> checkpoint_segments = 15
> effective_cache_size = 192000
> random_page_cost = 2
> geqo = true
> geqo_threshold = 50
> geqo_effort = 5
> geqo_pool_size = 0
> geqo_generations = 0
> geqo_selection_bias = 2.0
> from_collapse_limit = 10
> join_collapse_limit = 15
>
> OS configuration:
> ------------------
> I use Redhat 4 AS, kernel 2.6.9-11
> kernel.shmmax=1073741824
> kernel.sem=250 32000 100 128
> fs.aio-max-nr=5242880
> the server I configure just only for postgresql, no other service is
running
> like: www, samba, ftp, email, firewall
>
> hardware configuration:
> ------------------------
> Motherboard ASUS P5GD1
> Processor P4 3,2 GHz
> Memory 2 GB DDR 400,
> 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for
postgresql
> data and the partition is EXT3
> 1x80 GB EIDE 7200 RPM configure for system and home directory and the
> partiton is EXT3
>
> Did I miss something?
>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> Sent: Jumat, 23 September 2005 18:26
> To: Ahmad Fajar
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: RE: [PERFORM] tsearch2 seem very slow
>
> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>
>> Hi Oleg,
>>
>> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
>> only because the result still in postgres cache, but how about in the
> first
>> run? I didn't dare, the values is un-acceptable. Because my table will
> grows
>> rapidly, it's about 100000 rows per-week. And the visitor will search
>> anything that I don't know, whether it's the repeated search or new
> search,
>> or whether it's in postgres cache or not.
>
> if you have enoush shared memory postgresql will keep index pages there.
>
>
>>
>> I just compare with http://www.postgresql.org, the search is quite fast,
> and
>> I don't know whether the site uses tsearch2 or something else. But as fas
> as
>> I know, if the rows reach >100 milion (I have try for 200 milion rows and
> it
>> seem very slow), even if don't use tsearch2, only use simple query like:
>> select f1, f2 from table1 where f2='blabla',
>> and f2 is indexes, my postgres still slow on the first time, about >10
> sec.
>> because of this I tried something brand new to fullfill my needs. I have
>> used fti, and tsearch2 but still slow.
>>
>> I don't know what's going wrong with my postgres, what configuration must
> I
>> do to perform the query get fast result. Or must I use enterprisedb 2005
> or
>> pervasive postgres (both uses postgres), I don't know very much about
> these
>> two products.
>
> you didn't show us your configuration (hardware,postgresql and tsearch2),
> explain analyze of your queries, so we can't help you.
> How big is your database, tsearch2 index size ?
>
>
>>
>> Regards,
>> ahmad fajar
>>
>>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
>> Sent: Jumat, 23 September 2005 14:36
>> To: Ahmad Fajar
>> Cc: pgsql-performance(at)postgresql(dot)org
>> Subject: Re: [PERFORM] tsearch2 seem very slow
>>
>> Ahmad,
>>
>> how fast is repeated runs ? First time system could be very slow.
>> Also, have you checked my page
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>> and some info about tsearch2 internals
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>>
>> Oleg
>> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>>
>>> I have about 419804 rows in my article table. I have installed tsearch2
>> and
>>> its gist index correctly.
>>>
>>> My table structure is:
>>>
>>> CREATE TABLE tbarticles
>>>
>>> (
>>>
>>> articleid int4 NOT NULL,
>>>
>>> title varchar(250),
>>>
>>> mediaid int4,
>>>
>>> datee date,
>>>
>>> content text,
>>>
>>> contentvar text,
>>>
>>> mmcol float4 NOT NULL,
>>>
>>> sirkulasi float4,
>>>
>>> page varchar(10),
>>>
>>> tglisidata date,
>>>
>>> namapc varchar(12),
>>>
>>> usere varchar(12),
>>>
>>> file_pdf varchar(255),
>>>
>>> file_pdf2 varchar(50),
>>>
>>> kolom int4,
>>>
>>> size_jpeg int4,
>>>
>>> journalist varchar(120),
>>>
>>> ratebw float4,
>>>
>>> ratefc float4,
>>>
>>> fti tsvector,
>>>
>>> CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>>
>>> ) WITHOUT OIDS;
>>>
>>> Create index fti_idx1 on tbarticles using gist (fti);
>>>
>>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>>
>>>
>>>
>>> But when I search something like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>>
>>> It takes about 30 sec. I run explain analyze and the index is used
>>> correctly.
>>>
>>>
>>>
>>> Then I try multi column index to filter by date, and my query something
>>> like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
>> '2002-01-01'
>>> and datee <= current_date
>>>
>>> An it still run about 25 sec. I do run explain analyze and my
multicolumn
>>> index is used correctly.
>>>
>>> This is not acceptable if want to publish my website if the search took
>> very
>>> longer.
>>>
>>>
>>>
>>> I have run vacuum full analyze before doing such query. What going wrong
>>> with my query?? Is there any way to make this faster?
>>>
>>> I have try to tune my postgres configuration, but it seem helpless. My
>> linux
>>> box is Redhat 4 AS, and
>>>
>>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and
configure
>> as
>>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>>
>>>
>>>
>>> Please.help.help.
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Oleg Bartunov 2005-09-25 18:33:02 Re: tsearch2 seem very slow
Previous Message Michael Stone 2005-09-25 18:13:52 Re: Advice on RAID card