Re: Cost overestimation of foreign JOIN

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost overestimation of foreign JOIN
Date: 2020-12-01 13:17:50
Message-ID: CAExHW5sQwaaUdesHK1p4VU_jzygbPDT0h=6C1xcEnOFX+HJLow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2020 at 11:56 PM Andrey Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 30.11.2020 22:38, Tom Lane wrote:
> > Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
> >> Maybe it is needed to swap lines 2908 and 2909 (see attachment)?
> >
> > No; as explained in the comment immediately above here, we're assuming
> > that the join conditions will be applied on the cross product of the
> > input relations.
>
> Thank you. Now it is clear to me.
> >
> > Now admittedly, that's a worst-case assumption, since it amounts to
> > expecting that the remote server will do the join in the dumbest
> > possible nested-loop way. If the remote can use a merge or hash
> > join, for example, the cost is likely to be a lot less.
>
> My goal is scaling Postgres on a set of the same servers with same
> postgres instances. If one server uses for the join a hash-join node, i
> think it is most likely that the other server will also use for this
> join a hash-join node (Maybe you remember, I also use the statistics
> copying technique to provide up-to-date statistics on partitions). Tests
> show good results with such an approach. But maybe this is my special case.
>
> > But it is
> > not the job of this code path to outguess the remote planner. It's
> > certainly not appropriate to invent an unprincipled cost estimate
> > as a substitute for trying to guess that.
>
> Agreed.
> >
> > If you're unhappy with the planning results you get for this,
> > why don't you have use_remote_estimate turned on?
>
> I have a mixed load model. Large queries are suitable for additional
> estimate queries. But for many simple SELECT's that touch a small
> portion of the data, the latency has increased significantly. And I
> don't know how to switch the use_remote_estimate setting in such case.

You may disable use_remote_estimates for given table or a server. So
if tables participating in short queries are different from those in
the large queries, you could set use_remote_estimate at table level to
turn it off for the first set. Otherwise, we need a FDW level GUC
which can be turned on/off for a given session or a query.

Generally use_remote_estimate isn't scalable and there have been
discussions about eliminating the need of it. But no concrete proposal
has come yet.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-12-01 13:43:38 Re: Consider parallel for lateral subqueries with limit
Previous Message Amit Kapila 2020-12-01 13:17:29 Re: Notes on physical replica failover with logical publisher or subscriber