Skip site navigation (1) Skip section navigation (2)

TSearch2 and optimisation ...

From: Hervé Piedvache <footcow(at)noos(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: TSearch2 and optimisation ...
Date: 2004-08-25 22:48:32
Message-ID: 200408260048.32274.footcow@noos.fr (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

I'm a little beginner with Tsearch2 ....

I have simples tables like this :

# \d article
                                         Table "public.article"
   Column   |            Type             |                            
Modifiers
------------+-----------------------------+-----------------------------------------------------------------
 id     | integer                     | not null default 
nextval('public.article_rss_id_rss_seq'::text)
 id_site    | integer                     | not null
 title  | text                        |
 url    | text                        |
 desc   | text                        |
 r_date   | timestamp without time zone | default now()
 r_update | timestamp without time zone | default now()
 idxfti     | tsvector                    |
Indexes:
    "article_id_key" unique, btree (id)
    "idxfti_idx" gist (idxfti)
    "ix_article_update" btree (r_update)
    "ix_article_url" btree (url)
    "ix_id_site" btree (id_site)
Foreign-key constraints:
    "$1" FOREIGN KEY (id_site) REFERENCES site (id_site)
Triggers:
    tsvectorupdate BEFORE INSERT OR UPDATE ON article FOR EACH ROW EXECUTE 
PROCEDURE tsearch2('idxfti', 'title', 'desc')

# \d site_rss
                                Table "public.site"
    Column    |  Type   |                           Modifiers
--------------+---------+---------------------------------------------------------------
 id_site      | integer | not null default 
nextval('public.site_id_site_seq'::text)
 site_name    | text    |
 site_url     | text    |
url             | text    |
 language     | text    |
 datecrea     | date    | default now()
 id_category  | integer |
 time_refresh | integer |
 active       | integer |
 error        | integer |
Indexes:
    "site_id_site_key" unique, btree (id_site)
    "ix_site_id_category" btree (id_category)
    "ix_site_url" btree (url)

# \d user_choice
  Table "public.user_choice"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id_user | integer |
 id_site | integer |
Indexes:
    "ix_user_choice_all" unique, btree (id_user, id_site)

I have done a simple request, looking for title or description having Postgres 
inside order by rank and date,  like this :
SELECT a.title, a.id, a.url, to_char(a.r_date, 'DD/MM/YYYY HH24:MI:SS') as dt, 
s.site_name, s.id_site, case when exists (select id_user from user_choice u 
where u.id_site=s.id_site and u.id_user = 1) then 1 else 0 end as bookmarked
    FROM article a, site s
  WHERE s.id_site = a.id_site
       AND idxfti @@ to_tsquery('postgresql')
   ORDER BY rank(idxfti, to_tsquery('postgresql')) DESC, a.r_date DESC;

The request takes about 4 seconds ... I have about 1 400 000 records in 
article and 36 000 records in site table ... it's a Bi-Pentium III 933 MHz 
server with 1 Gb memory ... I'm using Postgresql 7.4.5
For me this result is very very slow I really need a quicker result with less 
than 1 second ...
The next time I call the same request I have got the result in 439 ms ... but 
If I replace "Postgresql" in my find with "Linux" for example I will get the 
next result in 5 seconds ... :o(

Is it a bad use of Tsearch2 ... or a bad table structure ... or from my 
request ... ? I have no idea how to optimise this ...

Explain gives me this result :
                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Sort  (cost=10720.91..10724.29 rows=1351 width=191)
   Sort Key: rank(a.idxfti, '\'postgresql\''::tsquery), a.r_date
   ->  Merge Join  (cost=4123.09..10650.66 rows=1351 width=191)
         Merge Cond: ("outer".id_site = "inner".id_site)
         ->  Index Scan using site_id_site_key on site s  (cost=0.00..2834.96 
rows=35705 width=28)
         ->  Sort  (cost=4123.09..4126.47 rows=1351 width=167)
               Sort Key: a.id_site
               ->  Index Scan using idxfti_idx on article a  
(cost=0.00..4052.84 rows=1351 width=167)
                     Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
                     Filter: (idxfti @@ '\'postgresql\''::tsquery)
         SubPlan
           ->  Seq Scan on user_choice u  (cost=0.00..2.69 rows=1 width=4)
                 Filter: ((id_site = $0) AND (id_user = 1))
(13 rows)

Any idea are well done ;o)

Regards,
-- 
Bill Footcow


Responses

pgsql-performance by date

Next:From: Doug McNaughtDate: 2004-08-25 22:52:05
Subject: Re: Anyone familiar with Apple Xserve RAID
Previous:From: Josh BerkusDate: 2004-08-25 21:22:08
Subject: Re: Anyone familiar with Apple Xserve RAID

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group