Re: Query-Planer from 6seconds TO DAYS

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Böckler Andreas <andy(at)boeckler(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query-Planer from 6seconds TO DAYS
Date: 2012-10-25 16:20:56
Message-ID: CAMkU=1yeAAbeK9mo3nwFRFqpV+EeL1w3a6KDfqrtbq9SS0fQDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Oct 24, 2012 at 11:51 AM, Böckler Andreas <andy(at)boeckler(dot)org> wrote:
>>
>> Was there more to the plan that you snipped? If not, why isn't it
>> checking all the other partitions?
>
> Your right. It's checking all partitions!. So the constraint exclusion doesn't kick in.
> This can be fixed with
> SELECT
> m.machine_id, s.timestamp, s.errorcode
> FROM
> events m
> INNER JOIN spsdata as s ON (m.machine_id=s.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30 seconds' AND m.timestamp)
> WHERE
> m.code IN (2024)
> AND m.timestamp BETWEEN '2012-08-01' AND '2012-08-29'
> AND s.timestamp BETWEEN '2012-08-01' AND '2012-08-29'
> AND s.errorcode in ('2024');

Even checking all the partitions it seemed to be pretty fast (78 ms).
Is it worth adding all of that spinach (which could easily get out of
date) just to improve a query that is already fast?

>
> It doesn't take hours to end, but it's not the performance gain you would expect.
>
> I'v changed the query to one partition spsdata_2012m08 and attached the slow and fast cases with EXPLAIN ANALYZE.
>
> The difference is one day in the WHERE-Clause
> 290.581 ms VS 687887.674 ms !
> Thats 2372 times slower.

From the fast case:

-> Bitmap Index Scan on spsdata_2012m08_machine_id_key
(cost=0.00..2338.28 rows=56026 width=0) (actual time=0.262..0.262
rows=6 loops=186)
Index Cond: ((s.machine_id = m.machine_id) AND
(s."timestamp" > (m."timestamp" - '00:00:30'::interval)) AND
(s."timestamp" <= m."timestamp"))

The difference in predicted rows to actual rows, 56026 to 6, is pretty
impressive. That is why the cost of the fast method is vastly
overestimated, and making it just slightly bigger yet pushes it over
the edge to looking more expensive than the slower sequential scan.
It does seem to be the case of the range selectivity not being
estimate correctly.

> How can i force the fast query plan in a select?

I'd probably punt and do it in the application code. Do the select on
the event table, then loop over the results issues the queries on the
spsdata table. That way the range endpoints would be constants rather
than coming from joins, and the planner should do a better job.

Can you load the data into 9.2 and see if it does better? (I'm not
optimistic that it will be.)

> I've played with seq_page_cost and enable_seqscan already, but you have to know the right values before SELECT to get good results ;)

Not sure what you mean here. If you change the settings just for the
query, it should be safe because when the query is already fast it is
not using the seq scan, so discouraging it from using one even further
is not going to do any harm.

Or do you mean you have lots of queries which are slow other than the
one shown, and you can't track all of them down?

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-10-25 18:22:26 Re: Query-Planer from 6seconds TO DAYS
Previous Message Claudio Freire 2012-10-25 14:27:51 Re: How to upgrade from 9.1 to 9.2 with replication?