Re: vacuum analyze slows sql query

From: patrick ~ <sidsrr(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: vacuum analyze slows sql query
Date: 2004-11-08 18:57:02
Message-ID: 20041108185702.99883.qmail@web52103.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry for the late reply. Was feeling a bit under the weather
this weekend and didn't get a chance to look at this.

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> patrick ~ <sidsrr(at)yahoo(dot)com> writes:
> > PREPARE pkk_00 ( integer ) <the def of pkk_offer_has_pending_purc( integer
> )
>
> This is what you want to do, but not quite like that. The PREPARE
> determines the plan and so VACUUMing and re-EXECUTing is going to show
> the same plan. What we need to look at is
> - standing start
> PREPARE pkk_00 ...
> EXPLAIN ANALYZE EXECUTE pkk_00 ...
> VACUUM ANALYZE;
> PREPARE pkk_01 ...
> EXPLAIN ANALYZE EXECUTE pkk_01 ...

But of course! I feel a bit silly now.

This is what I get after following Tom's directions:

pkk=# prepare pkk_00 ( integer ) as select ...
PREPARE
Time: 1.753 ms
pkk=# execute pkk_00( 241 );
case
------
f
(1 row)

Time: 0.788 ms
pkk=# explain analyze execute pkk_00( 241 );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=10.73..10.74 rows=1 width=0) (actual time=0.067..0.068 rows=1
loops=1)
InitPlan
-> Limit (cost=0.00..10.73 rows=1 width=4) (actual time=0.055..0.055
rows=0 loops=1)
-> Index Scan using pur_offer_id_idx on pkk_purchase p0
(cost=0.00..20690.18 rows=1929 width=4) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (offer_id = $1)
Filter: ((((expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
Total runtime: 0.213 ms
(7 rows)

Time: 24.654 ms
pkk=# vacuum analyze ;
VACUUM
Time: 128826.078 ms
pkk=# prepare pkk_01 ( integer ) as select ...
PREPARE
Time: 104.658 ms
pkk=# execute pkk_01( 241 );
case
------
f
(1 row)

Time: 7652.708 ms
pkk=# explain analyze execute pkk_01( 241 );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213
rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..2.66 rows=1 width=4) (actual
time=2872.189..2872.189 rows=0 loops=1)
-> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=13983
width=4) (actual time=2872.180..2872.180 rows=0 loops=1)
Filter: ((offer_id = $1) AND (((expire_time)::timestamp with
time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND
((cancel_date IS NULL) OR (pending = true)))
Total runtime: 2872.339 ms
(6 rows)

Time: 2873.479 ms

So it looks like after the VACCUM the planner resorts to Seq Scan
rather than Index Scan.

This is because of the value of correlation field in pg_stats
(according to PostgreSQL docs) being closer to 0 rather than
1:

pkk=# select tablename,attname,correlation from pg_stats where tablename =
'pkk_purchase' and attname = 'offer_id' ;
tablename | attname | correlation
--------------+----------+-------------
pkk_purchase | offer_id | 0.428598
(1 row)

So I started to experiment with ALTER TABLE SET STATISTICS
values to see which gets the correlation closer to 1. The
trend seems to indicat the higher the stat value is set it
pushes the correlation value closer to 0:

set statistics correlation
----------------------------
800 0.393108
500 0.408137
200 0.43197
50 0.435211
1 0.45758

And a subsequent PREPARE and EXPLAIN ANALYZE confirms that
the Planer reverts back to using the Index Scan after setting
stats to 1 (even though correlation value is still closer
to 0 than 1):

pkk=# explain analyze execute pkk_02( 241 );

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=2.95..2.96 rows=1 width=0) (actual time=0.068..0.069 rows=1
loops=1)
InitPlan
-> Limit (cost=0.00..2.95 rows=1 width=4) (actual time=0.056..0.056
rows=0 loops=1)
-> Index Scan using pur_offer_id_idx on pkk_purchase p0
(cost=0.00..35810.51 rows=12119 width=4) (actual time=0.053..0.053 rows=0
loops=1)
Index Cond: (offer_id = $1)
Filter: ((((expire_time)::timestamp with time zone > now()) OR
(expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR
(pending = true)))
Total runtime: 0.200 ms
(7 rows)

So, is this the ultimate solution to this issue?

--patrick


__________________________________
Do you Yahoo!?
Check out the new Yahoo! Front Page.
www.yahoo.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message John Meinel 2004-11-08 19:29:39 Re: vacuum analyze slows sql query
Previous Message Josh Berkus 2004-11-08 17:44:33 Re: Question regarding the file system