Re: planner question..

From: Rajesh Kumar Mallah <mallah(at)trade-india(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: planner question..
Date: 2003-04-17 15:10:06
Message-ID: 200304172040.06654.mallah@trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thursday 17 Apr 2003 7:56 pm, you wrote:
> Rajesh Kumar Mallah <mallah(at)trade-india(dot)com> writes:
> > For a distribution of data like below why does the planner
> > choses to do an index scan by default for source = 'REGIS' when > 50%
> > of the rows are having source='REGIS'.
>
> Are there a huge number of dead rows in the table? ("vacuum verbose"
> would give some info)
>

Yes as you predicted after a vacuum verbose the planner did
switch to seq scan and was only marginally slower than index
scan. (do u think we shud live with it?)

so whats the moral ?
ANALYZE itself is not sufficient in updating pg_statistic,
we must do VACUUM ANALYZE always ,
if so why would a seperate ANALYZE command exist ?

here is the info (in the same order as commands were executed)

tradein_clients=# explain analyze SELECT count(*) from general.profile_master where source='REGIS';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Aggregate (cost=18225.48..18225.48 rows=1 width=0) (actual time=6295.33..6295.34 rows=1 loops=1) |
| -> Index Scan using profile_master_index_source on profile_master (cost=0.00..18213.31 rows=4871 width=0) (actual time=14.54..6132.61 rows=102090 loops=1) |
| Index Cond: (source = 'REGIS'::character varying) |
| Total runtime: 6295.41 msec |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
(4 rows)

tradein_clients=# VACUUM VERBOSE general.profile_master;
INFO: --Relation general.profile_master--
INFO: Index profile_master_profile_id_pkey: Pages 426; Tuples 178031: Deleted 11802.
CPU 0.02s/0.12u sec elapsed 0.24 sec.
INFO: Index profile_master_index_userid: Pages 431; Tuples 178031: Deleted 8940.
CPU 0.00s/0.13u sec elapsed 0.14 sec.
INFO: Index profile_master_index_creation_date: Pages 420; Tuples 178031: Deleted 8940.
CPU 0.02s/0.12u sec elapsed 0.20 sec.
INFO: Index profile_master_index_company_id: Pages 419; Tuples 178031: Deleted 8940.
CPU 0.01s/0.10u sec elapsed 0.15 sec.
INFO: Index profile_master_index_eyp_list_id: Pages 419; Tuples 178031: Deleted 8940.
CPU 0.01s/0.11u sec elapsed 0.45 sec.
INFO: Index profile_master_index_iid_list_id: Pages 420; Tuples 178031: Deleted 8940.
CPU 0.01s/0.10u sec elapsed 0.46 sec.
INFO: Index profile_master_index_ip_list_id: Pages 421; Tuples 178031: Deleted 8940.
CPU 0.00s/0.11u sec elapsed 0.12 sec.
INFO: Index profile_master_index_catalog_company_id: Pages 419; Tuples 178031: Deleted 8940.
CPU 0.00s/0.12u sec elapsed 0.46 sec.
INFO: Index profile_master_index_source: Pages 619; Tuples 178031: Deleted 8940.
CPU 0.02s/0.09u sec elapsed 0.29 sec.
INFO: Removed 11802 tuples in 495 pages.
CPU 0.01s/0.03u sec elapsed 0.14 sec.
INFO: Pages 23923: Changed 36, Empty 0; Tup 178031: Vac 11802, Keep 0, UnUsed 325029.
Total CPU 0.96s/1.19u sec elapsed 11.97 sec.
INFO: --Relation pg_toast.pg_toast_122045388--
INFO: Pages 31: Changed 0, Empty 0; Tup 189: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.03 sec.
VACUUM

tradein_clients=# explain analyze SELECT count(*) from general.profile_master where source='REGIS';
+-----------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+-----------------------------------------------------------------------------------------------------------------------------+
| Aggregate (cost=26398.37..26398.37 rows=1 width=0) (actual time=8355.76..8355.76 rows=1 loops=1) |
| -> Seq Scan on profile_master (cost=0.00..26148.39 rows=99994 width=0) (actual time=16.23..8237.53 rows=102090 loops=1) |
| Filter: (source = 'REGIS'::character varying) |
| Total runtime: 8355.88 msec |
+-----------------------------------------------------------------------------------------------------------------------------+
(4 rows)

> The given result seems suspect; an indexscan couldn't possibly read >50%
> of the rows in less than a quarter of the time for a seqscan. Unless
> (a) the table contains vast amounts of empty space that the seqscan has to
> slog through, or

Sorry i dont understand the vacuum verbose output throughly , does it looks like
though ?

(b) your second measurement is bogus due to caching
> performed by the first measurement.

I am not sure this time but i have the habit of running
EXPLAIN ANALYZE thrice and post the middle one ;-)

>
> Also, might the table be in order by the "source" column? A
> sufficiently high correlation might have persuaded the planner to try an
> indexscan even if point (a) isn't true.

Yes the data is loaded from one source completely before loading
from another source. So they were in order but i did a lot of updates
and deletes.

regds
mallah.

>
> regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Rajesh Kumar Mallah 2003-04-17 15:12:16 Re: planner question..
Previous Message Rajesh Kumar Mallah 2003-04-17 14:56:36 Re: analyse question..