Re: Problems with plan estimates in postgres_fdw

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problems with plan estimates in postgres_fdw
Date: 2019-03-07 10:27:33
Message-ID: 5C80F215.7060905@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/03/06 22:00), Etsuro Fujita wrote:
> (2019/02/25 18:40), Etsuro Fujita wrote:
>> (2019/02/23 0:21), Antonin Houska wrote:
>>> Etsuro Fujita<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>>> What about an ORDER BY expression that contains multiple Var nodes? For
>>> example
>>>
>>> SELECT * FROM foo ORDER BY x + y;
>
>> Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted
>> paths for the base relation, as shown in the below example using HEAD
>> without the patchset proposed in this thread:
>>
>> postgres=# explain verbose select a+b from ft1 order by a+b;
>> QUERY PLAN
>> --------------------------------------------------------------------------
>>
>> Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4)
>> Output: (a + b)
>> Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST
>> (3 rows)
>>
>> I think it is OK for that function to generate such paths, as tlists for
>> such paths would be adjusted in apply_scanjoin_target_to_paths(), by
>> doing create_projection_path() to them.
>>
>> Conversely, it appears that add_foreign_ordered_paths() added by the
>> patchset would generate such pre-sorted paths *redundantly* when the
>> input_rel is the final scan/join relation. Will look into that.
>
> As mentioned above, I noticed that we generated a properly-sorted path
> redundantly in add_foreign_ordered_paths(), for the case where 1) the
> input_rel is the final scan/join relation and 2) the input_rel already
> has a properly-sorted path in its pathlist that was created by
> add_paths_with_pathkeys_for_rel(). So, I modified
> add_foreign_ordered_paths() to skip creating a path in that case.
>
> (2019/02/25 19:31), Etsuro Fujita wrote:
> > + /*
> > + * If this is an UPPERREL_ORDERED step performed on the final
> > + * scan/join relation, the costs obtained from the cache wouldn't yet
> > + * contain the eval costs for the final scan/join target, which would
> > + * have been updated by apply_scanjoin_target_to_paths(); add the eval
> > + * costs now.
> > + */
> > + if (fpextra && !IS_UPPER_REL(foreignrel))
> > + {
> > + /* The costs should have been obtained from the cache. */
> > + Assert(fpinfo->rel_startup_cost >= 0 &&
> > + fpinfo->rel_total_cost >= 0);
> > +
> > + startup_cost += foreignrel->reltarget->cost.startup;
> > + run_cost += foreignrel->reltarget->cost.per_tuple * rows;
> > + }
>
> > but as I said in the nearby thread, this part might be completely
> > redundant. Will re-consider about this.
>
> I thought that if it was true that in add_foreign_ordered_paths() we
> didn't need to consider pushing down the final sort to the remote in the
> case where the input_rel to that function is the final scan/join
> relation, the above code could be entirely removed from
> estimate_path_cost_size(), but I noticed that that is wrong; as we do
> not always have a properly-sorted path in the input_rel's pathlist
> already. So, I think we need to keep the above code so that we we can
> consider the final sort pushdown for the final scan/join relation in
> add_foreign_ordered_paths(). Sorry for the confusion. I moved the above
> code to the place we get cached costs, which I hope makes
> estimate_path_cost_size() a bit more readable.
>
> Other changes:
>
> * I modified add_foreign_final_paths() to skip creating a path if
> possible, in a similar way to add_foreign_ordered_paths().
> * I fixed the issue pointed out by Jeff [1].
> * I added more comments.

One thing I forgot to mention is a bug fix: when costing an *ordered*
foreign-grouping path using local statistics in
estimate_path_cost_size(), we get the rows estimate from the
foreignrel->rows, but in the previous version, the foreignrel->rows for
the grouping relation was not updated accordingly, so the rows estimate
was set to 0. I fixed this.

And I noticed that I sent in a WIP version of the patch set :(. Sorry
for that. That version wasn't modified well enough, especially to add
the fast-path mentioned above. Please find attached an updated version
(v6) of the patch set.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v6-0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely.patch text/x-patch 43.7 KB
v6-0002-Refactor-create_limit_path-to-share-cost-adjustment-.patch text/x-patch 4.8 KB
v6-0003-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely.patch text/x-patch 101.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-03-07 10:32:39 Re: Question about pg_upgrade from 9.2 to X.X
Previous Message Alexander Korotkov 2019-03-07 09:57:40 Re: [PROPOSAL] Drop orphan temp tables in single-mode