Re: Seq scan vs. Index scan with different query

From: Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>
To: eleven(at)ludojad(dot)itpp(dot)pl
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Seq scan vs. Index scan with different query
Date: 2004-07-05 11:44:13
Message-ID: 1089027854.6664.197.camel@lamb.mcmillan.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 2004-07-05 at 12:15 +0200, eleven(at)ludojad(dot)itpp(dot)pl wrote:
> Hello,
>
> Can anybody suggest any hint on this:
>
> temp=> EXPLAIN SELECT DISTINCT "number" FROM "tablex" WHERE "Date" BETWEEN '2004-06-28'::date AND '2004-07-04'::date AND "Time" BETWEEN '00:00:00'::time AND '18:01:00'::time;
>
> Unique (cost=305669.92..306119.43 rows=89 width=8)
> -> Sort (cost=305669.92..305894.67 rows=89903 width=8)
> Sort Key: "number"
> -> Index Scan using "DateTimeIndex" on "tablex" (cost=0.00..298272.66 rows=89903 width=8)
> Index Cond: (("Date" >= '2004-06-28'::date) AND ("Date" <= '2004-07-04'::date) AND ("Time" >= '00:00:00'::time without time zone) AND ("Time" <= '18:01:00'::time without time zone))
>
>
> temp=> EXPLAIN SELECT DISTINCT "number" FROM "tablex" WHERE "Date" BETWEEN '2004-06-28'::date AND '2004-07-04'::date AND "Time" BETWEEN '00:00:00'::time AND '19:01:00'::time;
>
> Unique (cost=315252.77..315742.27 rows=97 width=8)
> -> Sort (cost=315252.77..315497.52 rows=97900 width=8)
> Sort Key: "number"
> -> Seq Scan on "tablex" (cost=0.00..307137.34 rows=97900 width=8)
> Filter: (("Date" >= '2004-06-28'::date) AND ("Date" <= '2004-07-04'::date) AND ("Time" >= '00:00:00'::time without time zone) AND ("Time" <= '19:01:00'::time without time zone))
>
> Basically, the difference is in upper "Time" value (as you can see, it's
> 18:01:00 in the first query and 19:01:00 in the other one).
> The question is - why does it use index in first case and
> it tries to do full sequential scan when the upper "Time" value
> is different?
>
> DateTimeIndex was created on both columns (Date/Time):
> CREATE INDEX "DateTimeIndex" ON "tablex" USING btree ("Date", "Time");

PostgreSQL is always going to switch at some point, where the number of
rows that have to be read from the table exceed some percentage of the
total rows in the table.

We can possibly be more helpful if you send EXPLAIN ANALYZE, rather than
just EXPLAIN.

A few things to be careful of:

- Is this supposed to be a slice of midnight to 6pm, for each day
between 28 June and 4 July? If you want a continuous period from
Midnight 28 June -> 6pm 4 July you're better to have a single timestamp
field.

- It is unlikely that the , "Time" on your index is adding much to your
selectivity, and it may be that you would be better off without it.

- the DISTINCT can screw up your results, and it usually means that the
SQL is not really the best it could be. A _real_ need for DISTINCT is
quite rare in my experience, and from what I have seen it adds overhead
and tends to encourage bad query plans when used unnecessarily.

Hope this is some help.

Regards,
Andrew McMillan

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267
Make things as simple as possible, but no simpler -- Einstein
-------------------------------------------------------------------------

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message eleven 2004-07-05 13:46:15 Re: Seq scan vs. Index scan with different query conditions
Previous Message Richard Huxton 2004-07-05 11:41:16 Re: Seq scan vs. Index scan with different query conditions