Re: Forcing the right queryplan

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Henk van Lingen <H(dot)G(dot)K(dot)vanLingen(at)uu(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Forcing the right queryplan
Date: 2010-09-03 07:20:39
Message-ID: 4C80A1C7.9040006@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-novice

Henk van Lingen wrote:
> Now there are two types of query plans:
>
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN
>
> Limit (cost=0.00..10177.22 rows=100 width=159)
> -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00..
> 1052934.86 rows=10346 width=159)
> Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
> 211.112.9'::text))
> (3 rows)
>
> This one is useless (takes very long). However this one:
>
Hello Henk,

I saw your other mail today, I'm replying on this one for better formatting.

With a limit of 100 the planner guesses it will find 100 matching rows
within some cost. At 500 rows the cost is higher than that of the second
plan:
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500;
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> -----------------------------------
> Limit (cost=40928.89..40930.14 rows=500 width=159)
> -> Sort (cost=40928.89..40954.76 rows=10346 width=159)
> Sort Key: id
> -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034
> 6 width=159)
> Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
> squery('131.211.112.9'::text))
> -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346
> width=0)
> Index Cond: (to_tsvector('english'::regconfig, message) @@
> to_tsquery('131.211.112.9'::text))
> (7 rows)
>
> works acceptable.
>
> How to use the right plan regardless of the 'LIMIT-size'?
>
The planner obviously thinks it will have read 100 rows from
systemevents backwards earlier than it actually does, with the where
clause that contains the scanning for string 131.211.112.9. Increasing
the stats target in this case will probably not help, since the
statistics will not contain selectivity for all possible ts queries.

If the index is useless anyway, you might consider dropping it.
Otherwise, increasing random_page_cost might help in choosing the
otherplan, but on the other hand that plan has index scanning too, so
I'm not to sure there.

If that doesn't help, it would be interesting to see some output of
vmstat 1 (or better: iostat -xk 1) to see what is the bottleneck during
execution of the first plan. If it is IO bound, you might want to
increase RAM or add spindles for increased random io performance. If it
is CPU bound, it is probably because of executing the to_tsvector
function. In that case it might be interesting to see if changing
ts_vectors cost (see ALTER FUNCTION ... COST .../
http://developer.postgresql.org/pgdocs/postgres/sql-alterfunction.html)
again helps the planner to favor the second plan over the first.

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2010-09-03 07:41:17 Re: Compiling extension C with MingW in windows, Error...
Previous Message Devrim GÜNDÜZ 2010-09-03 06:24:06 Re: On-disk size of db increased after restore

Browse pgsql-novice by date

  From Date Subject
Next Message Turner, John J 2010-09-03 14:21:36 Re: Fwd: StackBuilder can't fetch application list?
Previous Message Sachin Srivastava 2010-09-03 07:18:35 Fwd: Re: Fwd: StackBuilder can't fetch application list?