Re: Increasing pattern index query speed

From: "Andrus" <kobruleht2(at)hot(dot)ee>
To: "Scott Carey" <scott(at)richrelevance(dot)com>, "Richard Huxton" <dev(at)archonet(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Increasing pattern index query speed
Date: 2008-11-26 19:24:48
Message-ID: 91652AA4F32448288A88CD62CF84B330@andrusnotebook
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Scott,

>My first thought on the query where a pattern being faster than the query
>with an exact value is that the planner does not have good enough
>statistics on that column. Without looking at the explain plans further, I
>would suggest trying something simple. The fact that it is fasster on 8.3
>but slower on 8.1 may have to do with changes between versions, or may
>simply be due to luck in the statistics sampling.
>See if increasing the statistics target on that column significantly does
>anything:
>EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
>2000 is simply a guess of mine for a value much larger than the default.
>This will generally make query planning slower but the system will have a
>lot more data about that column and the distribution of data in it. This
>should help stabilize the query performance.
>If this has an effect, the query plans will change.
>Your question below really boils down to something more simple:
> --Why is the most optimal query plan not chosen? This is usually due to
> either insufficient statistics or quirks in how the query planner works on
> a specific data >set or with certain configuration options.

Thank you very much.
I found that AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21' runs very
slow.

explain SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
JOIN toode USING (toode)
WHERE rid.toode like '99000010%'

plan with default statistics:

"Aggregate (cost=17.86..17.87 rows=1 width=0)"
" -> Nested Loop (cost=0.00..17.85 rows=1 width=0)"
" -> Nested Loop (cost=0.00..11.84 rows=1 width=24)"
" Join Filter: ("outer".dokumnr = "inner".dokumnr)"
" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81
rows=1 width=4)"
" Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
" -> Index Scan using rid_toode_pattern_idx on rid
(cost=0.00..6.01 rows=1 width=28)"
" Index Cond: ((toode ~>=~ '99000010'::bpchar) AND (toode
~<~ '99000011'::bpchar))"
" Filter: (toode ~~ '99000010%'::text)"
" -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1
width=24)"
" Index Cond: ("outer".toode = toode.toode)"

after statistics is changed query runs fast ( 70 ... 1000 ms)

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
FROM dok JOIN rid USING (dokumnr)
JOIN toode USING (toode)
WHERE rid.toode like '99000010%'
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
"Aggregate (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834
rows=1 loops=1)"
" -> Nested Loop (cost=0.00..27.04 rows=1 width=0) (actual
time=0.727..44.370 rows=108 loops=1)"
" -> Nested Loop (cost=0.00..21.02 rows=1 width=24) (actual
time=0.688..40.519 rows=108 loops=1)"
" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..5.81
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)"
" Index Cond: ((kuupaev >= '2008-11-21'::date) AND
(kuupaev <= '2008-11-21'::date))"
" -> Index Scan using rid_dokumnr_idx on rid
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0
loops=1678)"
" Index Cond: ("outer".dokumnr = rid.dokumnr)"
" Filter: (toode ~~ '99000010%'::text)"
" -> Index Scan using toode_pkey on toode (cost=0.00..6.00 rows=1
width=24) (actual time=0.016..0.020 rows=1 loops=108)"
" Index Cond: ("outer".toode = toode.toode)"
"Total runtime: 45.050 ms"

It seems that you are genius.

I used 1000 since doc wrote that max value is 1000

Rid table contains 3.5millions rows, will increase 1 millions of rows per
year and is updated frequently, mostly by adding.

Is it OK to leave

SET STATISTICS 1000;

setting for this table this column or should I try to decrease it ?

Andrus.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrus 2008-11-26 19:40:30 Re: Increasing pattern index query speed
Previous Message Richard Huxton 2008-11-26 19:07:36 Re: Increasing pattern index query speed