Re: Query is not using index when it should

From: tomas(at)nocrew(dot)org (Tomas =?iso-8859-1?q?Sk=E4re?=)
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org, pgsql-sql(at)postgresql(dot)org
Subject: Re: Query is not using index when it should
Date: 2004-12-11 14:17:13
Message-ID: 80d5xgudg6.fsf@junk.nocrew.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance pgsql-sql

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:

> On Fri, 10 Dec 2004, Tomas [iso-8859-1] Skäre wrote:
>
> > I have a table that looks like this:
> >
> > Table "public.cjm_object"
> > Column | Type | Modifiers
> > -----------+-------------------+-----------
> > timestamp | bigint | not null
> > jobid | bigint | not null
> > objectid | bigint | not null
> > class | integer | not null
> > field | character varying | not null
>
> In 7.4.x and earlier, you need to cast the value you're comparing to into
> a bigint in order to make sure the indexes are used (in your timestamp
> case it appears to work because the value doesn't fit in a plain integer).
> 8.0 should handle this better.

Thanks, casting worked well for that query. Now, could someone please
help me to get this query faster? With the 283465 rows, it takes far
too long time, I think. This is on a 2GHz Celeron running Linux 2.6.
shared_buffers=1000, sort_mem=1024.

select c.* from cjm_object c
inner join
(select max(timestamp) as timestamp,objectid,field from cjm_object
group by objectid,field) t
using(timestamp,objectid,field)
where 1=1 and data is not null
order by objectid,field;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=145511.85..150759.75 rows=1 width=54) (actual time=17036.147..20968.811 rows=208246 loops=1)
Merge Cond: (("outer".objectid = "inner".objectid) AND ("outer"."?column7?" = "inner"."?column4?") AND ("outer"."timestamp" = "inner"."timestamp"))
-> Sort (cost=47007.75..47611.06 rows=241324 width=54) (actual time=5113.099..5586.094 rows=236710 loops=1)
Sort Key: c.objectid, (c.field)::text, c."timestamp"
-> Seq Scan on cjm_object c (cost=0.00..5862.65 rows=241324 width=54) (actual time=0.129..1788.125 rows=236710 loops=1)
Filter: (data IS NOT NULL)
-> Sort (cost=98504.09..99212.75 rows=283465 width=48) (actual time=11922.081..12427.683 rows=255001 loops=1)
Sort Key: t.objectid, (t.field)::text, t."timestamp"
-> Subquery Scan t (cost=45534.39..51912.35 rows=283465 width=48) (actual time=5484.943..9289.061 rows=255001 loops=1)
-> GroupAggregate (cost=45534.39..49077.70 rows=283465 width=25) (actual time=5484.925..8178.531 rows=255001 loops=1)
-> Sort (cost=45534.39..46243.05 rows=283465 width=25) (actual time=5484.285..6324.067 rows=283465 loops=1)
Sort Key: objectid, field
-> Seq Scan on cjm_object (cost=0.00..5862.65 rows=283465 width=25) (actual time=0.124..852.749 rows=283465 loops=1)
Total runtime: 21161.144 ms

Quick explanation of the query:

Each row in the table is a field, which is part of an object. Ex:

timestamp objectid field data
1 1 name test
1 1 type something
1 2 name test2
1 2 type whatever

Timestamp is when the entry was inserted in the databas. When updating
a single field for an object, a new line with the new value is added,
data set to NULL if the field is deleted. So the above content could
now be:

timestamp objectid field data
1 1 name test
1 1 type something
1 2 name test2
1 2 type whatever
2 1 name newname
2 1 type <NULL>

Now, the query picks out the highest timestamp for each
(objectid,field) and then selects all columns for each match,
filtering out NULL data and ordering per objectid.

Is there any way to make this query faster? I've tried rewriting it,
putting the subquery as EXISTS condition, but it doesn't make it
faster. I've tried to create different indices, but they don't seem to
be used in this query.

Greetings,

Tomas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steinar H. Gunderson 2004-12-11 14:32:13 Re: [GENERAL] Query is not using index when it should
Previous Message Christopher Browne 2004-12-11 13:52:57 Re: Shared disk

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2004-12-11 14:32:13 Re: [GENERAL] Query is not using index when it should
Previous Message Josh Berkus 2004-12-11 05:52:40 Re: Partitioned table performance

Browse pgsql-sql by date

  From Date Subject
Next Message Steinar H. Gunderson 2004-12-11 14:32:13 Re: [GENERAL] Query is not using index when it should
Previous Message NosyMan 2004-12-11 12:47:09 PREPARED STATEMENT