Re: Yet another abort-early plan disaster on 9.3

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-09-19 18:40:17
Message-ID: 541C7891.7000902@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 09/19/2014 10:15 AM, Merlin Moncure wrote:
> On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> This is the core issue with abort-early plans; they depend on our
>> statistics being extremely accurate, which we know they are not. And if
>> they're wrong, the execution time climbs by 1000X or more. Abort-early
>> plans are inherently riskier than other types of query plans.
>>
>> What I'm not clear on is why upgrading from 9.1 to 9.3 would bring about
>> this change. The stats are no more than 10% different across the
>> version change.
>
> Amusingly on-topic rant I happened to read immediately after this by chance:
>
> http://wp.sigmod.org/?p=1075
>
> Is there a canonical case of where 'abort early' plans help? (I'm new
> to that term -- is it a recent planner innovation...got any handy
> links?)

Yeah, here's an example of the canonical case:

Table t1 ( a, b, c )

- "b" is low-cardinality
- "c" is high-cardinality
- There are separate indexes on both b and c.

SELECT a, b, c FROM t1
WHERE b = 2
ORDER BY c LIMIT 1;

In this case, the fastest plan is usually to use the index on C and
return the first row where the filter condition matches the filter on b.
This can be an order of magnitude faster than using the index on b and
then resorting by c and taking the first row, if (b=2) happens to match
20% of the table.

This is called an "abort early" plan because we expect to never finish
the scan on the index on c. We expect to scan the index on c, find the
first row that matches b=2 and exit.

The problem with such plans is that they are "risky". As in, if we are
wrong about our (b=2) stats, then we've just adopted a query plan which
will be 10X to 1000X slower than the more conventional plan.

We can see this in the bad plan I posted:

Limit (cost=1.00..2641.10 rows=20 width=202) (actual
time=9.933..711.372 rows=20 loops=1)
-> Nested Loop Semi Join (cost=1.00..9641758.39 rows=73041
width=202) (actual time=9.931..711.361 rows=20 loops=1)
-> Index Scan Backward using index_categories_on_recorded_on
on categories (cost=0.43..406943.98 rows=4199200 width=202) (actual
time=0.018..275.020 rows=170995 loops=1)

Notice how the total cost of the plan is a fraction of the cost of the
two steps which preceeded it? This is an indication that the planner
expects to be able to abort the index scan and nestloop join before it's
more than 2% through it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2014-09-19 18:52:34 Re: Anonymous code block with parameters
Previous Message Zhaomo Yang 2014-09-19 18:28:10 Re: A mechanism securing web applications in DBMS

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-09-19 21:40:49 Re: query a table with lots of coulmns
Previous Message Merlin Moncure 2014-09-19 17:15:34 Re: Yet another abort-early plan disaster on 9.3