Re: BUG #14469: Wrong cost estimates for merge append plan with partitions.

From: Alexey Ermakov <alexey(dot)ermakov(at)postgresql-consulting(dot)com>
To: maxim(dot)boguk(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14469: Wrong cost estimates for merge append plan with partitions.
Date: 2016-12-17 11:34:13
Message-ID: 585522b7.4391190a.fcd8e.a4ed@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I think there is a problem with estimation of startup_cost for Merge
Append node.

Currently for it's estimation we're using sum of startup_cost of child
nodes (which could be really small but in fact I think we require each
node to produce at least 1 row before merge append could pass them to
the parent node).

https://github.com/postgres/postgres/blob/34ca0905706422c191b3b0afef6e1c5f54399833/src/backend/optimizer/path/costsize.c#L1592
https://github.com/postgres/postgres/blob/0832f2db68cc43524a240db47d0428cc9525723e/src/backend/optimizer/util/pathnode.c#L1259

I think we should change that estimation and in case when child node
expected to return only one row (and really 0 rows) we should use it's
total_cost as base for estimation of startup_cost.

--
Alexey Ermakov

On 12/17/16 16:53, maxim(dot)boguk(at)gmail(dot)com wrote:
> The following bug has been logged on the website:
>
> Bug reference: 14469
> Logged by: Maksym Boguk
> Email address: maxim(dot)boguk(at)gmail(dot)com
> PostgreSQL version: 9.6.1
> Operating system: Linux
> Description:
>
> The problem query (which never finishes because of huge tables involved):
>
> explain SELECT "feed_actions".* FROM "feed_actions" WHERE (requested_at >
> '2016-12-17 09:44:17.042659') ORDER BY "feed_actions"."created_at" DESC
> LIMIT 1;
> QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=2.47..853.03 rows=1 width=697)
> -> Merge Append (cost=2.47..10603118.78 rows=12466 width=697)
> Sort Key: feed_actions.created_at DESC
> -> Index Scan Backward using feed_actions_created_at_idx on
> feed_actions (cost=0.12..8.14 rows=1 width=1184)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan Backward using feed_actions_2016_09_created_at_idx
> on feed_actions_2016_09 (cost=0.56..3212964.64 rows=1 width=202)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan Backward using feed_actions_2016_10_created_at_idx
> on feed_actions_2016_10 (cost=0.56..3612797.62 rows=1 width=202)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan Backward using feed_actions_2016_11_created_at_idx
> on feed_actions_2016_11 (cost=0.56..2998680.21 rows=1 width=214)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan Backward using feed_actions_2016_12_created_at_idx
> on feed_actions_2016_12 (cost=0.43..778269.49 rows=12442 width=696)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan Backward using feed_actions_2017_01_created_at_idx
> on feed_actions_2017_01 (cost=0.14..45.19 rows=20 width=1184)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
>
>
> In the same time (for all partitions except feed_actions_2016_12) there are
> no rows with requested_at > '2016-12-17 09:44:17.042659':
> explain analyze select * from feed_actions_2016_09 "feed_actions" WHERE
> (requested_at > '2016-12-17 09:44:17.042659');
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using feed_actions_2016_09_requested_at_idx on
> feed_actions_2016_09 feed_actions (cost=0.56..4.60 rows=1 width=202)
> (actual time=0.006..0.006 rows=0 loops=1)
> Index Cond: (requested_at > '2016-12-17 09:44:17.042659'::timestamp
> without time zone)
> Planning time: 0.094 ms
> Execution time: 0.024 ms
>
> What's more, selecting from the single partition with the same order
> by/limit - producing expected/fast plan:
> explain analyze select * from feed_actions_2016_09 "feed_actions" WHERE
> (requested_at > '2016-12-17 09:44:17.042659') ORDER BY
> "feed_actions"."created_at" DESC LIMIT 1;;
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=4.61..4.61 rows=1 width=202) (actual time=0.012..0.012 rows=0
> loops=1)
> -> Sort (cost=4.61..4.61 rows=1 width=202) (actual time=0.011..0.011
> rows=0 loops=1)
> Sort Key: created_at DESC
> Sort Method: quicksort Memory: 25kB
> -> Index Scan using feed_actions_2016_09_requested_at_idx on
> feed_actions_2016_09 feed_actions (cost=0.56..4.60 rows=1 width=202)
> (actual time=0.006..0.006 rows=0 loops=1)
> Index Cond: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> Planning time: 0.126 ms
> Execution time: 0.052 ms
>
>
> So statistics - are correct, but somehow the database think that search for
> the 0 (may by 1) row via scanning whole partition with backward index scan -
> will be fast.
> Question - what's wrong with merge append cost estimates in that case?
> Play with planner cost* setting - have no effect on the plan selection.
>
>
>
> PS: good/fast plan could be produced via common hack:
> explain analyze SELECT "feed_actions".* FROM "feed_actions" WHERE
> (requested_at > '2016-12-17 09:44:17.042659') ORDER BY
> "feed_actions"."created_at"+'0 second'::interval DESC LIMIT 1;
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1212.68..1212.69 rows=1 width=705) (actual time=32.134..32.134
> rows=1 loops=1)
> -> Sort (cost=1212.68..1243.86 rows=12470 width=705) (actual
> time=32.133..32.133 rows=1 loops=1)
> Sort Key: ((feed_actions.created_at + '00:00:00'::interval)) DESC
> Sort Method: top-N heapsort Memory: 25kB
> -> Result (cost=0.00..1150.33 rows=12470 width=705) (actual
> time=0.052..24.068 rows=8397 loops=1)
> -> Append (cost=0.00..994.46 rows=12470 width=697) (actual
> time=0.047..16.364 rows=8397 loops=1)
> -> Seq Scan on feed_actions (cost=0.00..0.00 rows=1
> width=1184) (actual time=0.003..0.003 rows=0 loops=1)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan using
> feed_actions_2016_09_requested_at_idx on feed_actions_2016_09
> (cost=0.56..4.60 rows=1 width=202) (actual time=0.007..0.007 rows=0
> loops=1)
> Index Cond: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan using
> feed_actions_2016_10_requested_at_idx on feed_actions_2016_10
> (cost=0.56..4.60 rows=1 width=202) (actual time=0.004..0.004 rows=0
> loops=1)
> Index Cond: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan using
> feed_actions_2016_11_requested_at_idx on feed_actions_2016_11
> (cost=0.56..6.85 rows=1 width=214) (actual time=0.004..0.004 rows=0
> loops=1)
> Index Cond: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Index Scan using
> feed_actions_2016_12_requested_at_idx on feed_actions_2016_12
> (cost=0.43..967.66 rows=12446 width=696) (actual time=0.027..13.166
> rows=8397 loops=1)
> Index Cond: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> -> Seq Scan on feed_actions_2017_01 (cost=0.00..10.75
> rows=20 width=1184) (actual time=0.005..0.005 rows=0 loops=1)
> Filter: (requested_at > '2016-12-17
> 09:44:17.042659'::timestamp without time zone)
> Planning time: 1.020 ms
> Execution time: 32.370 ms
>
>
>
>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Meskes 2016-12-17 15:32:51 Re: BUG #14460: ecpg dynamic describe data types incorrect
Previous Message maxim.boguk 2016-12-17 10:53:46 BUG #14469: Wrong cost estimates for merge append plan with partitions.