Re: Foreign join search stops on the first try

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Foreign join search stops on the first try
Date: 2022-01-25 16:37:32
Message-ID: ad549fc7fcb10998cca7485d61c75bdb@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashutosh Bapat писал 2022-01-25 17:08:
> This code was written long ago. So I may have some recollection
> errors. But AFAIR, the reasons we wanted to avoid repeated
> estimation/planning for the same foreign join rel were
> 1. If use_remote_estimate = true, we fetch EXPLAIN output from the
> foreign server for various pathkeys. Fetching EXPLAIN output is
> expensive. Irrespective of the join order being considered locally, we
> expect the foreign server to give us the same cost since the join is
> the same. So we avoid running EXPLAIN again and again.
> 2. If use_remote_estimate = false, the logic to estimate a foreign
> join locally is independent of the join order so should yield same
> cost again and again. For some reason that doesn't seem to be the case
> here.

Hi.
use_remote_estimate was set to false in our case, and yes, it fixed this
issue.
The problem is that if use_remote_estimate = false, the logic to
estimate a foreign join locally
is not independent from the join order.

In above example, without patch we see plan with cost:
cost=382.31..966.86 rows=2 width=37

If we avoid exiting on (joinrel->fdw_private), we can see in gdb the
following cases, when joining all 3 relations:

case 1:
outerrel:relids (stock, order_line), startup_cost = 100, total_cost =
2415.9200000000001, rel_startup_cost = 0, rel_total_cost = 2315.5,
retrieved_rows = 21
innerrel: relid (district) startup_cost = 100, total_cost =
101.14500000000001, rel_startup_cost = 0, rel_total_cost = 1.125,
retrieved_rows = 1
joinrel: startup_cost = 100, total_cost = 2416.875, retrieved_rows = 2

case 2:
outerrel: relids (district, order_line), startup_cost = 100, total_cost
= 281.41999999999996, rel_total_cost = 180, retrieved_rows = 71
innerrel: relid (stock), startup_cost = 100, total_cost =
683.28500000000008, rel_startup_cost = 0, rel_total_cost = 576.625,
retrieved_rows = 333
joinrel: startup_cost = 100, total_cost = 974.88, retrieved_rows = 2

So, (stock join order_line) join district has different cost from
(district join order_line) join stock.

>
> On Tue, Jan 25, 2022 at 1:26 PM Alexander Pyhalov
> <a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
>
>
> It is surprising that the planning time halves with the patch. I
> expected it to increase slightly since we will compute estimates
> thrice instead of once.

I wouldn't look at estimate times here precisely (and would looked at
costs). Real example where we found it had 100 times more data, but
effect was the same. Here some differences in planing time could be
related to restarting instances with or without patches.

>
> What is use_remote_estimate? Is it ON/OFF?
>

Yes, it was off.

> If we want to proceed along this line, we should take care not to fire
> more EXPLAIN queries on the foreign server.

You are correct. Fixed patch to avoid extensive join search when
use_remote_estimate is true.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v02-0001-Look-through-all-possible-foreign-join-orders.patch text/x-diff 3.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2022-01-25 16:42:14 Re: CREATEROLE and role ownership hierarchies
Previous Message Tom Lane 2022-01-25 16:32:35 Re: Fix BUG #17335: Duplicate result rows in Gather node