Re: select distinct and index usage

From: "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>
To: "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: select distinct and index usage
Date: 2008-04-07 07:47:17
Message-ID: e7f9235d0804070047h6c102042u276902e992784c3c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 7, 2008 at 2:05 AM, Alban Hertroys
<dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
> On Apr 7, 2008, at 1:32 AM, David Wilson wrote:
>
> >
>
> The databases estimates seem consistent with yours, so why is it doing
> this? Could you provide an EXPLAIN ANALYSE? It shows the actual numbers next
> to the estimates, although I figure that query might take a while...

explain analyze select distinct datestamp from vals;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=15003047.47..15380004.83 rows=4263 width=4) (actual
time=649599.159..721671.595 rows=4252 loops=1)
-> Sort (cost=15003047.47..15191526.15 rows=75391472 width=4)
(actual time=649599.157..694392.602 rows=75391476 loops=1)
Sort Key: datestamp
Sort Method: external merge Disk: 1178592kB
-> Seq Scan on vals (cost=0.00..1531261.72 rows=75391472
width=4) (actual time=9.104..93130.468 rows=75391476 loops=1)
Total runtime: 722379.434 ms

(There were a couple other very long-running, disk-intensive queries
going on in the background of this, so that runtime is a little
inflated, but the values should still all be relevant.)

> Pg estimates the costs quite high too. It's almost as if there isn't an
> index on that column and it has no other way then doing a sequential scan...
> Could you show us the table definition and its indexes? What version of Pg
> is this?

Pg is 8.3.1

Table definition:
CREATE TABLE vals (
sid integer NOT NULL,
eid integer NOT NULL,
datestamp integer NOT NULL,
val_dur integer NOT NULL,
acc real NOT NULL,
yld real NOT NULL,
rt real NOT NULL,
ydev real NOT NULL,
vydev real NOT NULL,
adev real NOT NULL,
achange real NOT NULL,
ychange real NOT NULL,
arsi real NOT NULL,
yrsi real NOT NULL,
UNIQUE (sid,eid,val_dur,datestamp),
FOREIGN KEY (sid,eid,datestamp) REFERENCES preds
(sid,eid,datestamp) ON DELETE CASCADE
);
create index val_datestamp_idx on vals(datestamp);
create index val_evaluator_idx on vals(eid);
create index val_search_key on vals(val_dur,eid,datestamp);
create index val_vd_idx on vals(val_dur,datestamp);

(The various indices are for a variety of common queries into the table)

> It may be that your index on vals.datestamp doesn't fit into memory; what
> are the relevant configuration parameters for your database?

That's a very good question. I recently had to rebuild this particular
database and haven't played with the configuration parameters as much
as I'd like- what parameters would be most relevant here? I hadn't
realized that an index needed to fit into memory.

pg_total_relation_size('vals') - pg_relation_size('vals') gives 11gb.
All indexed columns are integers. My guess is that this means that
it's likely the index doesn't fit into memory.

--
- David T. Wilson
Princeton Satellite Systems
david(dot)t(dot)wilson(at)gmail(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dave Page 2008-04-07 07:55:19 Re: edb-debugger, debugging pl/pgsql
Previous Message Albe Laurenz 2008-04-07 07:36:25 Re: Connection reset by peer / broken pipe