Skip site navigation (1) Skip section navigation (2)

Re: Partial index on date column

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Dave Page <dpage(at)vale-housing(dot)co(dot)uk>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partial index on date column
Date: 2003-03-06 18:03:17
Message-ID: Pine.LNX.4.33.0303061054240.5572-100000@css120.ihs.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, 6 Mar 2003, Dave Page wrote:

> I have a table that is likely to grow over the next few years at a rate
> of 1K-2K rows/day. As the vast majority of the activity on the table
> (other than the inserts) will be selects of data for the current day, I
> have a cron job that drops and recreates a partial index just after
> midnight. It also vacuum analyzes the table.
> 
> -- Index: public.pbx_log_today_idx
> CREATE INDEX pbx_log_today_idx ON pbx_log USING btree (pbx_time,
> pbx_call_type, pbx_digits_source, pbx_digits_destination) WHERE
> (pbx_date = '2003-03-06'::date);
> 
> I'm surprised by the following behaviour:
> 
> EXPLAIN SELECT * FROM pbx_log WHERE pbx_date = CURRENT_DATE;
> 
> Seq Scan on pbx_log  (cost=0.00..286.20 rows=1274 width=384)
>   Filter: (pbx_date = ('now'::text)::date)
> 
> 
> EXPLAIN SELECT * FROM pbx_log WHERE pbx_date = '2003-03-06';
> 
> Index Scan using pbx_log_today_idx on pbx_log  (cost=0.00..5.00 rows=0
> width=384)
>   Filter: (pbx_date = '2003-03-06'::date)
> 
> Is this just an oddity because I don't have masses of data yet (4500
> rows right now), or is this something the optimizer cannot handle?

It's not an oddity, it's the planner trying to decide which is the better 
choice, a seq scan or an index scan.  If you had 150,000,000 rows and 
asked for 149,999,000 of them, it would be counterproductive to use an 
index, since you're gonna visit nearly every page of the table anyway.

If you were gonna get 1,000 rows out of 150,000,000 then an index scan 
makes more sense,

Somewhere in between is the switch point where going from one to the other 
makes sense.

Since you've got 4500 rows and asked for 1274 of them it's likely that the 
database will have to read the whole table anyway, so it goes ahead and 
does it.

Look for random_page_cost and a few other settings near it in the 
$PGDATA/postgresql.conf file.


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2003-03-06 18:12:35
Subject: Re: Partial index on date column
Previous:From: Bruce MomjianDate: 2003-03-06 18:01:17
Subject: Re: TODO: DROP COLUMN .. CASCADE

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group