Re: Problems with plan estimates in postgres_fdw

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: andrew(at)tao11(dot)riddles(dot)org(dot)uk
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problems with plan estimates in postgres_fdw
Date: 2018-08-02 02:10:13
Message-ID: 20180802.111013.212913738.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Thu, 02 Aug 2018 01:06:41 +0100, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote in <87pnz1aby9(dot)fsf(at)news-spur(dot)riddles(dot)org(dot)uk>
> This analysis comes from investigating a report from an IRC user. A
> summary of the initial report is:
>
> Using PG 9.6.9 and postgres_fdw, a query of the form "select * from
> foreign_table order by col limit 1" is getting a local Sort plan, not
> pushing the ORDER BY to the remote. Turning off use_remote_estimates
> changes the plan to use a remote sort, with a 10000x speedup.
>
> I don't think this can be called a bug, exactly, and I don't have an
> immediate fix, so I'm putting this analysis up for the benefit of anyone
> working on this in future.

I didn't see the concrete estimates, it seems that the cause is
too-small total cost of non-remote-sorted plan compared with the
startup cost of remote-sorted one. In other words, tuple cost by
the remoteness is estimated as too small. Perhaps setting
fdw_tuple_cost to , say 1 as an extreme value, will bring victory
to remote sort path for the query.

> The cause of the misplan seems to be this: postgres_fdw with
> use_remote_estimates on does not attempt to obtain fast-start plans from
> the remote. In this case what happens is this:
>
> 1. postgres_fdw gets the cost estimate from the plain remote fetch, by
> doing "EXPLAIN select * from table". This produces a plan with a low
> startup cost (just the constant overhead) and a high total cost (on
> the order of 1.2e6 in this case).
>
> 2. postgres_fdw gets the cost estimate for the ordered fetch, by doing
> "EXPLAIN select * from table order by col". Note that there is no
> LIMIT nor any cursor_tuple_fraction in effect, so the plan returned
> in this case is a seqscan+sort plan (in spite of the presence of an
> index on "col"), with a very high (order of 8e6) startup and total
> cost.
>
> So when the local side tries to generate paths, it has the choice of
> using a remote-ordered path with startup cost 8e6, or a local top-1
> sort on top of an unordered remote path, which has a total cost on the
> order of 1.5e6 in this case; cheaper than the remote sort because this
> only needs to do top-1, while the remote is sorting millions of rows
> and would probably spill to disk.

A simple test at hand showed that (on a unix-domain connection):

=# explain (verbose on, analyze on) select * from ft1 order by a;
> Foreign Scan on public.ft1 (cost=9847.82..17097.82 rows=100000 width=4)
> (actual time=195.861..515.747 rows=100000 loops=1)

=# explain (verbose on, analyze on) select * from ft1;
> Foreign Scan on public.ft1 (cost=100.00..8543.00 rows=100000 width=4)
> (actual time=0.659..399.427 rows=100000 loops=1)

The cost is apaprently wrong. On my environment fdw_startup_cost
= 45 and fdw_tuple_cost = 0.2 gave me an even cost/actual time
ratio *for these queries*. (hard coded default is 100 and
0.01. Of course this disucussion is ignoring the accuracy of
local-execution estimate.)

=# explain (verbose on, analyze on) select * from ft1 order by a;
> Foreign Scan on public.ft1 (cost=9792.82..31042.82 rows=100000 width=4)
> (actual time=201.493..533.913 rows=100000 loops=1)

=# explain (verbose on, analyze on) select * from ft1;
> Foreign Scan on public.ft1 (cost=45.00..22488.00 rows=100000 width=4)
> (actual time=0.837..484.469 rows=100000 loops=1)

This gave me a remote-sorted plan for "select * from ft1 order by
a limit 1". (But also gave me a remote-sorted plan without a
LIMIT..)

> However, when it comes to actual execution, postgres_fdw opens a cursor
> for the remote query, which means that cursor_tuple_fraction will come
> into play. As far as I can tell, this is not set anywhere, so this means
> that the plan that actually gets run on the remote is likely to have
> _completely_ different costs from those returned by the EXPLAINs. In
> particular, in this case the fast-start index-scan plan for the ORDER BY
> remote query is clearly being chosen when use_remote_estimates is off
> (since the query completes in 15ms rather than 150 seconds).
>
> One possibility: would it be worth adding an option to EXPLAIN that
> makes it assume cursor_tuple_fraction?

Cursor fraction seems working since the foreign scan with remote
sort has a cost with different startup and total values. The
problem seems to be a too-small tuple cost.

So, we might have a room for improvement on
DEFAULT_FDW_STARTUP_COST, DEFAULT_FDW_TUPLE_COST and
DEFAULT_FDW_SORT_MULTIPLIER settings.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-08-02 02:15:14 insert on conflict on updatable views
Previous Message Alvaro Herrera 2018-08-02 01:50:14 Re: [report] memory leaks in COPY FROM on partitioned table