Quick Links

Why should such a simple query over indexed columns be so slow?

From:	Alessandro Gagliardi <alessandro(at)path(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Why should such a simple query over indexed columns be so slow?
Date:	2012-01-30 19:13:08
Message-ID:	CAAB3BBLmzQvP0rREYJveHo=3OO8zOJJ7eL7pW5StuZKe9kVC-g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

So, here's the query:

SELECT private, COUNT(block_id) FROM blocks WHERE created > 'yesterday' AND
shared IS FALSE GROUP BY private

What confuses me is that though this is a largish table (millions of rows)
with constant writes, the query is over indexed columns of types timestamp
and boolean so I would expect it to be very fast. The clause where created
> 'yesterday' is there mostly to speed it up, but apparently it doesn't
help much.

Here's the *Full Table and Index Schema*:

CREATE TABLE blocks
(
block_id character(24) NOT NULL,
user_id character(24) NOT NULL,
created timestamp with time zone,
locale character varying,
shared boolean,
private boolean,
moment_type character varying NOT NULL,
user_agent character varying,
inserted timestamp without time zone NOT NULL DEFAULT now(),
networks character varying[],
lnglat point,
CONSTRAINT blocks_pkey PRIMARY KEY (block_id )
)

WITH (
OIDS=FALSE
);

CREATE INDEX blocks_created_idx
ON blocks
USING btree
(created DESC NULLS LAST);

CREATE INDEX blocks_lnglat_idx
ON blocks
USING gist
(lnglat );

CREATE INDEX blocks_networks_idx
ON blocks
USING btree
(networks );

CREATE INDEX blocks_private_idx
ON blocks
USING btree
(private );

CREATE INDEX blocks_shared_idx
ON blocks
USING btree
(shared );

Here's the results from *EXPLAIN ANALYZE:*

"HashAggregate (cost=156619.01..156619.02 rows=2 width=26) (actual
time=43131.154..43131.156 rows=2 loops=1)"
*" -> Seq Scan on blocks (cost=0.00..156146.14 rows=472871 width=26)
(actual time=274.881..42124.505 rows=562888 loops=1)"
**" Filter: ((shared IS FALSE) AND (created > '2012-01-29
00:00:00+00'::timestamp with time zone))"
**"Total runtime: 43131.221 ms"*
I'm using *Postgres version:* 9.0.5 (courtesy of Heroku)

As for *History:* I've only recently started using this query, so there
really isn't any.

As for *Hardware*: I'm using Heroku's "Ronin" setup which involves 1.7 GB
Cache. Beyond that I don't really know.

As for *Maintenance Setup*: I let Heroku handle that, so I again, I don't
really know. FWIW though, vacuuming should not really be an issue (as I
understand it) since I don't really do any updates or deletions. It's
pretty much all inserts and selects.

As for *WAL Configuration*: I'm afraid I don't even know what that is. The
query is normally run from a Python web server though the above explain was
run using pgAdmin3, though I doubt that's relevant.

As for *GUC Settings*: Again, I don't know what this is. Whatever Heroku
defaults to is what I'm using.

Thank you in advance!
-Alessandro Gagliardi

Responses

Re: Why should such a simple query over indexed columns be so slow? at 2012-01-30 19:24:54 from Claudio Freire

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2012-01-30 19:24:54	Re: Why should such a simple query over indexed columns be so slow?
Previous Message	Andy Colson	2012-01-30 18:33:00	Re: How to improve insert speed with index on text column