Re: [PERFORM] Inaccurate Explain Cost

From: Edson Richter <edsonrichter(at)hotmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [PERFORM] Inaccurate Explain Cost
Date: 2012-09-26 20:20:01
Message-ID: BLU0-SMTP46616B9B80F299E088F5165CF9C0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Em 26/09/2012 17:03, Shaun Thomas escreveu:
> On 09/26/2012 01:38 PM, Robert Sosinski wrote:
>
>> I seem to be getting an inaccurate cost from explain. Here are two
>> examples for one query with two different query plans:
>
> Well, there's this:
>
> Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual
> time=163.275..186869.844 rows=43904 loops=1)
>
> If anything's a smoking gun, that is. I could see why you'd want to
> turn off nested loops to get better execution time. But the question
> is: why did it think it would match so few rows in the first place?
> The planner probably would have thrown away this query plan had it
> known it would loop 20x more than it thought.
>
> I think we need to know what your default_statistics_target is set at,
> and really... all of your relevant postgresql settings.
>
> Please see this:
>
> http://wiki.postgresql.org/wiki/Slow_Query_Questions
>
> But you also may need to look a lot more into your query itself. The
> difference between a 2 or a 3 minute query isn't going to help you
> much. Over here, we tend to spend more of our time turning 2 or 3
> minute queries into 20 or 30ms queries. But judging by your date
> range, getting the last 2-months of data from a table that large
> generally won't be fast by any means.
>
> That said, looking at your actual query:
>
> SELECT COUNT(DISTINCT eu.id)
> FROM exchange_uploads eu
> JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
> LEFT JOIN uploads u ON u.id = eu.upload_id
> LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
> LEFT JOIN exchanges e ON e.id = ud.exchange_id
> WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
> AND ud.office_id = 6;
>
> Doesn't need half of these joins. They're left joins, and never used
> in the query results or where criteria. You could just use this:

Interesting. I've similar situation, where user can choose a set of
filters, and then the query must have several left joins "just in case"
(user need in the filer).
I know other database that is able to remove unnecessary outer joins
from queries when they are not relevant and for instance become faster.
Can't PostgreSQL do the same?

Regards,

Edson.

>
> SELECT COUNT(DISTINCT eu.id)
> FROM exchange_uploads eu
> JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
> WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
> AND ud.office_id = 6;
>
> Though I presume this is just a count precursor to a query that
> fetches the actul results and does need the left join. Either way, the
> index scan from your second example matches 3.3M rows by using the
> created_at index on exchange_uploads. That's not really very
> restrictive, and so you have two problems:
>
> 1. Your nested loop stats from office_id are somehow wrong. Try
> increasing your stats on that column, or just
> default_statistics_target in general, and re-analyze.
> 2. Your created_at criteria above match way too many rows, and will
> also take a long time to process.
>
> Those are your two actual problems. We can probably get your query to
> run faster, but those are pretty significant hurdles.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2012-09-26 20:21:40 Re: Inaccurate Explain Cost
Previous Message Shaun Thomas 2012-09-26 20:03:03 Re: Inaccurate Explain Cost

Browse pgsql-performance by date

  From Date Subject
Next Message hubert depesz lubaczewski 2012-09-26 20:21:40 Re: Inaccurate Explain Cost
Previous Message Jeff Janes 2012-09-26 20:11:49 Re: Guide to Posting Slow Query Questions