From: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Wrong cost estimation for foreign tables join with use_remote_estimate disabled |
Date: | 2018-06-22 09:15:53 |
Message-ID: | 5B2CBE49.8070006@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Konstantin,
(2018/06/22 15:26), Konstantin Knizhnik wrote:
> On 21.06.2018 20:08, Tom Lane wrote:
>> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> writes:
>>> The following very simple test reduce the problem with wrong cost
>>> estimation:
>>> create foreign table t1_fdw(x integer, y integer) server pg_fdw options
>>> (table_name 't1', use_remote_estimate 'false');
>>> create foreign table t2_fdw(x integer) server pg_fdw options (table_name
>>> 't2', use_remote_estimate 'false');
>>> It is possible to force Postgres to use correct plan by setting
>>> "fdw_startup_cost" to some very large value (100000000 for example).
>>> ...
>>> Also correct plan is used when use_remote_estimate is true.
>> If you are unhappy about the results with use_remote_estimate off, don't
>> run it that way. The optimizer does not have a crystal ball.
>
> As I wrote, use_remote_estimate can not be used because in this case
> query compilation time is unacceptable (10 seconds, while time of query
> execution itself is ~200msec).
> So the problem can be addressed in two ways:
>
> 1. Try to reduce time of remote estimation. I wonder why postgres_fdw
> sends so much queries to remote server. For join of two tables there are
> 7 queries.
> I suspect that for ~20 joined tables in the original query number of
> calls is more than hundred, so on wonder that it takes so much time.
> 2. Try to make optimizer make better estimation of join cost based on
> local statistic (please notice that ANALYZE is explicitly called for all
> foreign tables and number of rows in the result was correctly calculated).
To make local estimates more accurate, I think we need other information
on remote tables such as remote indexes.
> What do you think: which of this two direction is more perspective? Or
> it is better to address both of them?
I'd vote for #2. One idea for that is to introduce CREATE FOREIGN INDEX
to have information on remote indexes on the local side, which I
proposed before. I have been putting it on hold since then, though.
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2018-06-22 09:23:36 | Re: add default parallel query to v10 release notes? (Re: [PERFORM] performance drop after upgrade (9.6 > 10)) |
Previous Message | Amit Kapila | 2018-06-22 09:15:01 | Re: server crashed with TRAP: FailedAssertion("!(!parallel_aware || pathnode->path.parallel_safe)" |