Re: Forcing the right queryplan

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Henk van Lingen <H(dot)G(dot)K(dot)vanLingen(at)uu(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Forcing the right queryplan
Date: 2010-09-07 17:26:25
Message-ID: 8DCA2553-B84E-424B-8806-DA7A2FE6D8C0@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-novice

Sorry for not replying earlier, I've been quite busy.

On 31 Aug 2010, at 16:50, Henk van Lingen wrote:

> syslog=# \d systemevents
> Table "public.systemevents"
> Column | Type | Modi
> fiers
> --------------------+-----------------------------+-----------------------------
> ------------------------------
> id | integer | not null default nextval('sy

(...)

> message | text |

(...)

> Indexes:
> "systemevents_pkey" PRIMARY KEY, btree (id)
> "fromhost_idx" btree (fromhost)
> "msgs_idx" gin (to_tsvector('english'::regconfig, message))
>
> The GIN index is to do text searching (via LogAnalyzer).
>
> Now there are two types of query plans:

Do you have output of explain analyse for these queries as well? It's hard to see what is actually going on with just the explain - we can't see which part of the query is more expensive than the planner expected, for starters.

> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 100; QUERY PLAN
>
> --------------------------------------------------------------------------------
> ---------------------------------
> Limit (cost=0.00..10177.22 rows=100 width=159)
> -> Index Scan Backward using systemevents_pkey on systemevents (cost=0.00..
> 1052934.86 rows=10346 width=159)
> Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
> 211.112.9'::text))
> (3 rows)
>
> This one is useless (takes very long). However this one:
>
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE ( ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) ) ORDER BY id DESC LIMIT 500;
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> -----------------------------------
> Limit (cost=40928.89..40930.14 rows=500 width=159)
> -> Sort (cost=40928.89..40954.76 rows=10346 width=159)
> Sort Key: id
> -> Bitmap Heap Scan on systemevents (cost=2898.06..40413.36 rows=1034
> 6 width=159)
> Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
> squery('131.211.112.9'::text))
> -> Bitmap Index Scan on msgs_idx (cost=0.00..2895.47 rows=10346
> width=0)
> Index Cond: (to_tsvector('english'::regconfig, message) @@
> to_tsquery('131.211.112.9'::text))
> (7 rows)
>
> works acceptable.

Odd that more records and a more complicated plan gives faster results... That's why I think we'd really want to see explain analyse output.
I'm guessing that there are a lot of records matching your search string and that you've found the cut-off point where the planner thinks you're throwing away enough rows that it's not very useful to first select all the matching records before sorting the results.

I think it decided to just start searching backwards along the id and returning the rows that match that IP (and are visible to your transaction) would be faster than trying to work with all the rows that match that IP.

This probably means it misjudged the costs of sorting your index backwards, which indicates that your planning statistics are off, or that your cost parameters aren't appropriate for your system.

One thing I do notice is that the first plan uses the index on id instead of the ts_vector one. For queries like those you could try to use a combined index like this:

CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) ON systemevents USING (gin);

Whether to use gist or gin there I really don't know. I'd probably go for gist, I seem to recall that gin is fairly heavy to use.

> How to use the right plan regardless of the 'LIMIT-size'?

You could try turning off planner-options, but that's probably a fairly bad idea.

Other options are to use a prepared statement or a stored procedure with the IP as a parameter, which force the planner to use a more general plan because it doesn't know which values you're going to search for before it plans the query.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4c8675d010409863511634!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-09-07 18:41:01 Re: Forcing the right queryplan
Previous Message Craig Ringer 2010-09-07 15:47:50 Re: Configure default for sorting of null-values?

Browse pgsql-novice by date

  From Date Subject
Next Message Craig Barnes 2010-09-07 17:32:51 Re: Fwd: StackBuilder can't fetch application list?
Previous Message Turner, John J 2010-09-07 15:43:04 Re: Fwd: StackBuilder can't fetch application list?