text search query planning

From: Henk van Lingen <H(dot)G(dot)K(dot)vanLingen(at)uu(dot)nl>
To: pgsql-general(at)postgresql(dot)org
Subject: text search query planning
Date: 2009-11-19 11:04:47
Message-ID: 20091119110447.GA17878@uu.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hi,

I've a problem understanding of slow query. I use 8.4 for storing
syslog messages, on which i want to do text searching. This table:

syslog=# \d fw_msgs
Table "public.fw_msgs"
Column | Type | Modifiers
----------+------+-----------
datetime | text |
facility | text |
level | text |
host | text |
program | text |
pid | text |
message | text |
Indexes:
"fw_msgs_datetime_idx" btree (datetime)
"fw_msgs_facility_idx" btree (facility)
"fw_msgs_host_idx" btree (host)
"fw_msgs_idx" gin (to_tsvector('english'::regconfig, message))
"fw_msgs_program_idx" btree (program)

contains millions of records.

This query is fast:

syslog=# explain select datetime,facility,level,host,program,pid,message from fw_msgs where to_tsvector('english', message) @@ to_tsquery ('131.211.1.124') limit 3000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=17658.01..25225.08 rows=3000 width=176)
-> Bitmap Heap Scan on fw_msgs (cost=17658.01..181537.99 rows=64971 width=176)
Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.1.124'::text))
-> Bitmap Index Scan on fw_msgs_idx (cost=0.00..17641.77 rows=64971 width=0)
Index Cond: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.1.124'::text))
(5 rows)

However, when I want to order by datetime, it takes for ever. It seems the db
insist on first sorting on datetime instead of using the gin index:

syslog=# explain select datetime,facility,level,host,program,pid,message from fw_msgs where to_tsvector('english', message) @@ to_tsquery ('131.211.1.124') order by datetime limit 3000;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Limit (cost=0.00..44985.68 rows=3000 width=176)
-> Index Scan using fw_msgs_datetime_idx on fw_msgs (cost=0.00..974644.66 rows=64997 width=176)
Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.211.1.124'::text))
(3 rows)

doing something like

select datetime,facility,level,host,program,pid,message from fw_msgs where message in (select message from fw_msgs where to_tsvector('english', message) @@ to_tsquery ('131.211.1.124') ) order by datetime desc limit 3000;

doesn't help.

Any ideas?

Regards,
--
Henk van Lingen, (o- -+
Netwerk & Telefonie, ICT Service Center /\ |
Universiteit Utrecht, Jenalaan 18a, kamer 0.12 v_/_
http://henk.vanlingen.net/ http://www.tuxtown.net/netiquette/

Browse pgsql-general by date

  From Date Subject
Next Message Pedro Doria Meunier 2009-11-19 12:53:00 obtaining ARRAY position for a given match
Previous Message Tech 2010 2009-11-19 10:42:34 Re: vacuumdb: vacuuming of database "xy" failed: PANIC: corrupted item pointer: 19227