Re: Why should such a simple query over indexed columns be so slow?

From: Alessandro Gagliardi <alessandro(at)path(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Why should such a simple query over indexed columns be so slow?
Date: 2012-01-30 20:35:04
Message-ID: CAAB3BBLcBnuz+Domkvkmw5acDys9+cwWNi+8CNorjUd02WJMNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Well that was a *lot* faster:

"HashAggregate (cost=156301.82..156301.83 rows=2 width=26) (actual
time=2692.806..2692.807 rows=2 loops=1)"
" -> Bitmap Heap Scan on blocks (cost=14810.54..155828.95 rows=472871
width=26) (actual time=289.828..1593.893 rows=575186 loops=1)"
" Recheck Cond: (created > '2012-01-29 00:00:00+00'::timestamp with
time zone)"
" Filter: (shared IS FALSE)"
" -> Bitmap Index Scan on blocks_created_idx (cost=0.00..14786.89
rows=550404 width=0) (actual time=277.407..277.407 rows=706663 loops=1)"
" Index Cond: (created > '2012-01-29 00:00:00+00'::timestamp
with time zone)"
"Total runtime: 2693.107 ms"

To answer your (non-)question about Heroku, it's a cloud service, so I
don't host PostgreSQL myself. I'm not sure how much I can mess with things
like GUC since I don't even have access to the "postgres" database on the
server. I am a long time SQL user but new to Postgres so I welcome
suggestions on where to start with that sort of thing.
Setting enable_seqscan=false made a huge difference, so I think I'll start
there.

Thank you very much!
-Alessandro

On Mon, Jan 30, 2012 at 11:24 AM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Mon, Jan 30, 2012 at 4:13 PM, Alessandro Gagliardi
> <alessandro(at)path(dot)com> wrote:
> > So, here's the query:
> >
> > SELECT private, COUNT(block_id) FROM blocks WHERE created > 'yesterday'
> AND
> > shared IS FALSE GROUP BY private
> >
> > What confuses me is that though this is a largish table (millions of
> rows)
> > with constant writes, the query is over indexed columns of types
> timestamp
> > and boolean so I would expect it to be very fast. The clause where
> created >
> > 'yesterday' is there mostly to speed it up, but apparently it doesn't
> help
> > much.
>
> The number of rows touched is ~0.5M, and is correctly estimated, which
> would lead me to believe PG estimates the index plan to be slower.
>
> You could try by executing first "set enable_seqscan=false;" and then
> your query with explain analyze again. You'll probably get an index
> scan, and you'll see both how it performs and how PG thought it would
> perform. Any mismatch between the two probably means you'll have to
> change the planner tunables (the x_tuple_cost ones) to better match
> your hardware.
>
>
> > As for Hardware: I'm using Heroku's "Ronin" setup which involves 1.7 GB
> > Cache. Beyond that I don't really know.
> snip
> > As for GUC Settings: Again, I don't know what this is. Whatever Heroku
> > defaults to is what I'm using.
>
> And there's your problem. Without knowing/understanding those, you
> won't get anywhere. I don't know what Heroku is, but you should find
> out both hardware details and PG configuration details.
>
> > As for Maintenance Setup: I let Heroku handle that, so I again, I don't
> > really know. FWIW though, vacuuming should not really be an issue (as I
> > understand it) since I don't really do any updates or deletions. It's
> pretty
> > much all inserts and selects.
>
> Maintainance also includes analyzing the table, to gather stats that
> feed the optimizer, and it's very important to keep the stats
> accurate. You can do it manually - just perform an ANALYZE. However,
> the plan doesn't show any serious mismatch between expected and actual
> rowcounts, which suggests stats aren't your problem.
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2012-01-30 20:50:21 Re: Why should such a simple query over indexed columns be so slow?
Previous Message Jeff Janes 2012-01-30 20:32:25 Re: How to improve insert speed with index on text column