Optimizer questions

From: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Optimizer questions
Date: 2016-01-05 14:55:28
Message-ID: 9879B786-E011-44A1-91B8-54649B84106D@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I want to ask two questions about PostgreSQL optimizer.
I have the following query:

SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,(select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users(o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner = uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;

It is executed more than one hours. And the same query with "limit 8" is executed in just few seconds.
get_authorized_users is quite expensive function performing recursive query through several tables.

In first case (limit 9) sequential scan with sort is used while in the second case index scan on "mtime" index is performed and so no sort is needed.
In the second case we can extract just 9 rows and calculate get_authorized_users for them. In the first case we have to calculate this functions for ALL (~15M) rows.

I investigated plans of both queries and find out there are three reasons of the problem:

1. Wrong estimation of filter selectivity: 65549 vs. 154347 real.
2. Wrong estimation of joins selectivity: 284 vs. 65549 real
3. Wrong estimation of get_authorized_users cost: 0.25 vs. 57.379 real

Below is output of explain analyse:

Limit (cost=1595355.77..1595355.79 rows=9 width=301) (actual time=3823128.752..3823128.755 rows=9 loops=1)
-> Sort (cost=1595355.77..1595356.48 rows=284 width=301) (actual time=3823128.750..3823128.753 rows=9 loops=1)
Sort Key: s.mtime
Sort Method: top-N heapsort Memory: 30kB
-> Nested Loop (cost=1.96..1595349.85 rows=284 width=301) (actual time=75.453..3822829.640 rows=65549 loops=1)
-> Nested Loop Left Join (cost=1.54..1589345.66 rows=284 width=271) (actual time=1.457..59068.107 rows=65549 loops=1)
-> Nested Loop Left Join (cost=0.98..1586923.67 rows=284 width=255) (actual time=1.430..55661.926 rows=65549 loops=1)
Join Filter: (((o.id)::text = (f.obj_id)::text) AND ((o.owner)::text = (f.user_id)::text))
Rows Removed by Join Filter: 132670698
-> Nested Loop (cost=0.98..1576821.09 rows=284 width=236) (actual time=0.163..27721.566 rows=65549 loops=1)
Join Filter: ((o.mime_id)::integer = (m.mime_id)::integer)
Rows Removed by Join Filter: 48768456
-> Seq Scan on mime m (cost=0.00..13.45 rows=745 width=30) (actual time=0.008..1.372 rows=745 loops=1)
-> Materialize (cost=0.98..1573634.65 rows=284 width=214) (actual time=0.004..22.918 rows=65549 loops=745)
-> Nested Loop (cost=0.98..1573633.23 rows=284 width=214) (actual time=0.142..7095.554 rows=65549 loops=1)
-> Nested Loop (cost=0.56..1570232.37 rows=406 width=184) (actual time=0.130..6468.376 rows=65549 loops=1)
-> Seq Scan on objects s (cost=0.00..1183948.58 rows=45281 width=63) (actual time=0.098..5346.023 rows=154347 loops=1)
Filter: (((mime_id)::integer = 904) OR ((mime_id)::integer = 908))
Rows Removed by Filter: 15191155
-> Index Scan using objects_pkey on objects o (cost=0.56..8.52 rows=1 width=140) (actual time=0.006..0.007 rows=0 loops=154347)
Index Cond: ((id)::text = (s.original_file)::text)
-> Index Scan using users_pkey on users uc (cost=0.42..8.37 rows=1 width=49) (actual time=0.008..0.009 rows=1 loops=65549)
Index Cond: ((user_id)::text = (o.creator)::text)
-> Materialize (cost=0.00..48.36 rows=2024 width=38) (actual time=0.001..0.133 rows=2024 loops=65549)
-> Seq Scan on flags f (cost=0.00..38.24 rows=2024 width=38) (actual time=0.009..0.462 rows=2024 loops=1)
-> Index Scan using objects_last_activity_unique_index on objects_last_activity la (cost=0.56..8.52 rows=1 width=54) (actual time=0.044..0.046 rows=1 loops=65549)
Index Cond: (((o.id)::text = (obj_id)::text) AND ((o.owner)::text = (user_id)::text))
-> Index Scan using users_pkey on users uo (cost=0.42..8.37 rows=1 width=49) (actual time=0.015..0.019 rows=1 loops=65549)
Index Cond: ((user_id)::text = (o.owner)::text)
SubPlan 1
-> Aggregate (cost=12.75..12.76 rows=1 width=32) (actual time=57.390..57.391 rows=1 loops=65549)
-> Function Scan on get_authorized_users (cost=0.25..10.25 rows=1000 width=32) (actual time=57.379..57.379 rows=2 loops=65549)
Total runtime: 3823133.235 ms
(33 rows)

Then I try to manually adjust cost of function get_authorized_users using "alter function ... cost" statement.
It has effect on plans shown by analyze. Now plan with index scan has much small cost (2.39..1911772.85) than plan with sort (8507669.11..8507669.13).
But still sort is used for query execution unless I set "enable_sort=off".

I tried to investigate work of optimizer and find out that this choice is made in grouping_planner function:

/*
* Forget about the presorted path if it would be cheaper to sort the
* cheapest-total path. Here we need consider only the behavior at
* the tuple_fraction point. Also, limit_tuples is only relevant if
* not grouping/aggregating, so use root->limit_tuples in the
* cost_sort call.
*/
if (sorted_path)
{
Path sort_path; /* dummy for result of cost_sort */

if (root->query_pathkeys == NIL ||
pathkeys_contained_in(root->query_pathkeys,
cheapest_path->pathkeys))
{
/* No sort needed for cheapest path */
sort_path.startup_cost = cheapest_path->startup_cost;
sort_path.total_cost = cheapest_path->total_cost;
}
else
{
/* Figure cost for sorting */
cost_sort(&sort_path, root, root->query_pathkeys,
cheapest_path->total_cost,
path_rows, path_width,
0.0, work_mem, root->limit_tuples);
}

if (compare_fractional_path_costs(sorted_path, &sort_path,
tuple_fraction) > 0)
{
/* Presorted path is a loser */
sorted_path = NULL;
}
}

In this case cost of sorted path is higher than of seq scan path and so sorted path is rejected.
So now two questions:

1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

2. Actually there is no need to calculate get_authorized_users functions for ALL records before sort. This field is not involved in any calculations. So we can perform sort, extract 9 rows and then calculate get_authorized_users only for them. I wonder why PostgreSQL optimizer is not trying to split target list into columns which are needed immediately and which are can be calculated later (lazy evaluation of columns). It seems to be very efficient in case of presence of filtering and limiting operators. But for some reasons it is not happen in this query. Certainly I can rewrite it using nested subselect, doing this optimization myself. But I find this optimization to be quite useful...

I can send example reproducing the problem with the latest devel branch. It is using slightly different database schema, dummy data and produce different estimations and timing.
But the problems are persist: optimizer choose the plan with worser cost and doesn't perform lazy evaluation of target list.

Thanks in advance,
Konstantin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-05 14:57:58 Re: [PATCH] Refactoring of LWLock tranches
Previous Message Chapman Flack 2016-01-05 14:24:11 Re: dynloader.h missing in prebuilt package for Windows?