Re: Yet another abort-early plan disaster on 9.3

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-09-19 17:15:34
Message-ID: CAHyXU0zSCWLYZ34fsgAGn9x+nFEwOih4ve3kyeYonTY6xG1Xbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Folks,
>
> Just encountered another case of critical fail for abort-early query
> plans. In this case, it will completely prevent a user from upgrading
> to 9.3; this is their most common query, and on 9.3 it takes 1000X longer.
>
> Maybe we should think about removing abort-early plans from 9.5?
> Clearly we don't understand them well enough for them to work for users.
>
> Query:
>
> SELECT "categories".* FROM "categories" WHERE "categories"."user_id" IN
> ( SELECT to_user_id FROM "tags" WHERE "tags"."from_user_id" = 53529975 )
> ORDER BY recorded_on DESC LIMIT 20;
>
> Here's the plan from 9.1:
>
> Limit (cost=1613.10..1613.15 rows=20 width=194) (actual
> time=0.503..0.509 rows=20 loops=1)
> -> Sort (cost=1613.10..1736.14 rows=49215 width=194) (actual
> time=0.502..0.505 rows=20 loops=1)
> Sort Key: categories.recorded_on
> Sort Method: top-N heapsort Memory: 30kB
> -> Nested Loop (cost=248.80..303.51 rows=49215 width=194)
> (actual time=0.069..0.347 rows=81 loops=1)
> -> HashAggregate (cost=248.80..248.81 rows=1 width=4)
> (actual time=0.050..0.054 rows=8 loops=1)
> -> Index Scan using unique_index_tags on tags
> (cost=0.00..248.54 rows=103 width=4) (actual time=0.020..0.033 rows=8
> loops=1)
> Index Cond: (from_user_id = 53529975)
> -> Index Scan using index_categories_on_user_id on
> categories (cost=0.00..54.34 rows=29 width=194) (actual
> time=0.010..0.028 rows=10 loops=8)
> Index Cond: (user_id = tags.to_user_id)
> Total runtime: 0.641 ms
>
> And from 9.3:
>
> Limit (cost=1.00..2641.10 rows=20 width=202) (actual
> time=9.933..711.372 rows=20 loops=1)
> -> Nested Loop Semi Join (cost=1.00..9641758.39 rows=73041
> width=202) (actual time=9.931..711.361 rows=20 loops=1)
> -> Index Scan Backward using index_categories_on_recorded_on
> on categories (cost=0.43..406943.98 rows=4199200 width=202) (actual
> time=0.018..275.020 rows=170995 loops=1)
> -> Index Scan using unique_index_tags on tags
> (cost=0.57..2.20 rows=1 width=4) (actual time=0.002..0.002 rows=0
> loops=170995)
> Index Cond: ((from_user_id = 53529975) AND (to_user_id =
> categories.user_id))
> Total runtime: 711.457 ms
>
> So, here's what's happening here:
>
> As usual, PostgreSQL is dramatically undercounting n_distinct: it shows
> chapters.user_id at 146,000 and the ratio of to_user_id:from_user_id as
> being 1:105 (as opposed to 1:6, which is about the real ratio). This
> means that PostgreSQL thinks it can find the 20 rows within the first 2%
> of the index ... whereas it actually needs to scan 50% of the index to
> find them.
>
> Removing LIMIT causes 9.3 to revert to the "good" plan, as expected.
>
> This is the core issue with abort-early plans; they depend on our
> statistics being extremely accurate, which we know they are not. And if
> they're wrong, the execution time climbs by 1000X or more. Abort-early
> plans are inherently riskier than other types of query plans.
>
> What I'm not clear on is why upgrading from 9.1 to 9.3 would bring about
> this change. The stats are no more than 10% different across the
> version change.

Amusingly on-topic rant I happened to read immediately after this by chance:

http://wp.sigmod.org/?p=1075

Is there a canonical case of where 'abort early' plans help? (I'm new
to that term -- is it a recent planner innovation...got any handy
links?)

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-19 17:18:22 Re: Support for N synchronous standby servers
Previous Message Robert Haas 2014-09-19 17:13:16 Re: Minor improvement in lock.sgml

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-09-19 18:40:17 Re: Yet another abort-early plan disaster on 9.3
Previous Message Pavel Stehule 2014-09-19 13:32:00 Re: query a table with lots of coulmns