Re: group by will not use an index?

From: Brian Herlihy <btherl(at)yahoo(dot)com(dot)au>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: group by will not use an index?
Date: 2007-01-10 01:07:03
Message-ID: 20070110010703.7815.qmail@web52315.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Actually, as I recently discovered, GROUP BY is faster than DISTINCT. It's just due to how they are implemented, so don't go looking for any deep reason :) The thread "GROUP BY vs DISTINCT" from 2006-12-20 discusses it. DISTINCT sorts the results to find the unique rows, but GROUP BY uses a hash.

Brian

----- Original Message ----
From: Adam Rich <adam(dot)r(at)sbcglobal(dot)net>
To: tsuraan <tsuraan(at)gmail(dot)com>; pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Sent: Wednesday, 10 January, 2007 7:32:50 AM
Subject: Re: [PERFORM] group by will not use an index?

Message

That
query looks strange to me (a group by without an aggregate). See if this
is

any
faster:

SELECT
DISTINCT DATE(inserted) FROM Messages

I
won't hold my breath though, I don't think there's any way around the full table
scan

in
Postgres, because the index does not contain enough information about
transactional

state,
so table access is always required (unlike virtually every other type of
db)

-----Original Message-----
From:
pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
tsuraan
Sent: Tuesday, January 09, 2007 5:06 PM
To:
pgsql-performance
Subject: [PERFORM] group by will not use an
index?

I have a table of messages with paths and inserted
dates (among other things), like so:

CREATE TABLE Messages
(
msgkey BIGSERIAL PRIMARY KEY,

path TEXT NOT NULL,
inserted TIMESTAMP WITHOUT TIMEZONE
DEFAULT NOW()
);

I run a query to determine which days actually saw
emails come in, like so:

SELECT DATE(inserted) FROM Messages GROUP BY
DATE(inserted);

That's obviously not very efficient, so I made an
index:

CREATE INDEX messages_date_inserted_ind ON
Messages(DATE(inserted));

However, GROUP BY does not use this
index:

=# explain analyze select date(inserted) from messages group by
date(inserted);

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------
HashAggregate
(cost=104773.10..104789.51 rows=1313 width=8) (actual
time=31269.476..31269.557 rows=44 loops=1)
-> Seq
Scan on messages (cost=0.00..101107.25 rows=1466340 width=8) (actual
time=23.923..25248.400 rows=1467036 loops=1)
Total runtime: 31269.735
ms
(3 rows)

Is it possible to get pg to use an index in a group
by? I don't see why it wouldn't be possible, but maybe I'm missing
something.

Using pg 8.1.4...

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2007-01-10 01:11:04 Re: group by will not use an index?
Previous Message Scott Marlowe 2007-01-09 23:42:04 Re: group by will not use an index?