Re: Wrong cost estimation for foreign tables join with use_remote_estimate disabled

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wrong cost estimation for foreign tables join with use_remote_estimate disabled
Date: 2018-06-22 09:15:53
Message-ID: 5B2CBE49.8070006@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Konstantin,

(2018/06/22 15:26), Konstantin Knizhnik wrote:
> On 21.06.2018 20:08, Tom Lane wrote:
>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>> The following very simple test reduce the problem with wrong cost
>>> estimation:
>>> create foreign table t1_fdw(x integer, y integer) server pg_fdw options
>>> (table_name 't1', use_remote_estimate 'false');
>>> create foreign table t2_fdw(x integer) server pg_fdw options (table_name
>>> 't2', use_remote_estimate 'false');
>>> It is possible to force Postgres to use correct plan by setting
>>> "fdw_startup_cost" to some very large value (100000000 for example).
>>> ...
>>> Also correct plan is used when use_remote_estimate is true.
>> If you are unhappy about the results with use_remote_estimate off, don't
>> run it that way. The optimizer does not have a crystal ball.
>
> As I wrote, use_remote_estimate can not be used because in this case
> query compilation time is unacceptable (10 seconds, while time of query
> execution itself is ~200msec).
> So the problem can be addressed in two ways:
>
> 1. Try to reduce time of remote estimation. I wonder why postgres_fdw
> sends so much queries to remote server. For join of two tables there are
> 7 queries.
> I suspect that for ~20 joined tables in the original query number of
> calls is more than hundred, so on wonder that it takes so much time.
> 2. Try to make optimizer make better estimation of join cost based on
> local statistic (please notice that ANALYZE is explicitly called for all
> foreign tables and number of rows in the result was correctly calculated).

To make local estimates more accurate, I think we need other information
on remote tables such as remote indexes.

> What do you think: which of this two direction is more perspective? Or
> it is better to address both of them?

I'd vote for #2. One idea for that is to introduce CREATE FOREIGN INDEX
to have information on remote indexes on the local side, which I
proposed before. I have been putting it on hold since then, though.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-06-22 09:23:36 Re: add default parallel query to v10 release notes? (Re: [PERFORM] performance drop after upgrade (9.6 > 10))
Previous Message Amit Kapila 2018-06-22 09:15:01 Re: server crashed with TRAP: FailedAssertion("!(!parallel_aware || pathnode->path.parallel_safe)"