Re: Simple query not using index: why?

From: William Garrison <postgres(at)mobydisk(dot)com>
To: Adrian Klaver <aklaver(at)comcast(dot)net>
Cc: Postgres General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Simple query not using index: why?
Date: 2008-09-03 19:55:17
Message-ID: 48BEEBA5.3030108@mobydisk.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Can't it just scan the index to get that? I assumed the index had links
to every fileid in the table. In my over-simplified imagination, the
table looks like this:

ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
etc.

While the index looks like
fileid|ctid
fileid|ctid
fileid|ctid
fileid|ctid
...

So I expected scanning the index was faster, and still had everything it
needed to do the count. Or perhaps it was because I said COUNT(*) so it
needs to look at the other columns in the table? I really just wanted
the number of "hits" not the number of records with distinct values or
anything like that. My understanding was that COUNT(*) did that, and
didn't really look at the columns themselves.

Adrian Klaver wrote:
> -------------- Original message ----------------------
> From: William Garrison <postgres(at)mobydisk(dot)com>
>
>> I am looking for records with duplicate keys, so I am running this query:
>>
>> SELECT
>> fileid, COUNT(*)
>> FROM
>> file
>> GROUP BY
>> fileid
>> HAVING
>> COUNT(*)>1
>>
>> The table has an index on fileid (non-unique index) so I am surprised
>> that postgres is doing a table scan. This database is >15GB, and there
>> are a number of fairly large string columns in the table. I am very
>> surprised that scanning the index is not faster than scanning the
>> table. Any thoughts on that? Is scanning the table faster than
>> scanning the index? Is there a reason that it needs anything other than
>> the index?
>>
>>
>
> I may be missing something, but it would have to scan the entire table to get all the occurrences of each fileid in order to do the count(*).
>
>
>
> --
> Adrian Klaver
> aklaver(at)comcast(dot)net
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joshua Drake 2008-09-03 20:01:30 Re: Simple query not using index: why?
Previous Message Tony Caduto 2008-09-03 19:54:52 Re: SELECT INTO returns incorrect values