Re: Problems with plan estimates in postgres_fdw

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problems with plan estimates in postgres_fdw
Date: 2019-03-20 11:47:06
Message-ID: 5C92283A.40208@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

(2019/03/07 19:27), Etsuro Fujita wrote:
> (2019/03/06 22:00), Etsuro Fujita wrote:
>> (2019/02/25 18:40), Etsuro Fujita wrote:
>>> (2019/02/23 0:21), Antonin Houska wrote:
>>>> Etsuro Fujita<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>>>> What about an ORDER BY expression that contains multiple Var nodes? For
>>>> example
>>>>
>>>> SELECT * FROM foo ORDER BY x + y;
>>
>>> Actually, add_paths_with_pathkeys_for_rel() generates such pre-sorted
>>> paths for the base relation, as shown in the below example using HEAD
>>> without the patchset proposed in this thread:
>>>
>>> postgres=# explain verbose select a+b from ft1 order by a+b;
>>> QUERY PLAN
>>> --------------------------------------------------------------------------
>>>
>>>
>>> Foreign Scan on public.ft1 (cost=100.00..200.32 rows=2560 width=4)
>>> Output: (a + b)
>>> Remote SQL: SELECT a, b FROM public.t1 ORDER BY (a + b) ASC NULLS LAST
>>> (3 rows)
>>>
>>> I think it is OK for that function to generate such paths, as tlists for
>>> such paths would be adjusted in apply_scanjoin_target_to_paths(), by
>>> doing create_projection_path() to them.
>>>
>>> Conversely, it appears that add_foreign_ordered_paths() added by the
>>> patchset would generate such pre-sorted paths *redundantly* when the
>>> input_rel is the final scan/join relation. Will look into that.
>>
>> As mentioned above, I noticed that we generated a properly-sorted path
>> redundantly in add_foreign_ordered_paths(), for the case where 1) the
>> input_rel is the final scan/join relation and 2) the input_rel already
>> has a properly-sorted path in its pathlist that was created by
>> add_paths_with_pathkeys_for_rel(). So, I modified
>> add_foreign_ordered_paths() to skip creating a path in that case.

I had a rethink about this and noticed that the previous version was not
enough; since query_pathkeys is set to root->sort_pathkeys in that case
(ie, the case where the input_rel is a base or join relation), we would
already have considered pushing down the final sort when creating
pre-sorted foreign paths for the input_rel in postgresGetForeignPaths()
or postgresGetForeignJoinPaths(), so *no work* in that case. I modified
the patch as such.

>> (2019/02/25 19:31), Etsuro Fujita wrote:
>> > + /*
>> > + * If this is an UPPERREL_ORDERED step performed on the final
>> > + * scan/join relation, the costs obtained from the cache wouldn't yet
>> > + * contain the eval costs for the final scan/join target, which would
>> > + * have been updated by apply_scanjoin_target_to_paths(); add the eval
>> > + * costs now.
>> > + */
>> > + if (fpextra && !IS_UPPER_REL(foreignrel))
>> > + {
>> > + /* The costs should have been obtained from the cache. */
>> > + Assert(fpinfo->rel_startup_cost >= 0 &&
>> > + fpinfo->rel_total_cost >= 0);
>> > +
>> > + startup_cost += foreignrel->reltarget->cost.startup;
>> > + run_cost += foreignrel->reltarget->cost.per_tuple * rows;
>> > + }
>>
>> > but as I said in the nearby thread, this part might be completely
>> > redundant. Will re-consider about this.
>>
>> I thought that if it was true that in add_foreign_ordered_paths() we
>> didn't need to consider pushing down the final sort to the remote in the
>> case where the input_rel to that function is the final scan/join
>> relation, the above code could be entirely removed from
>> estimate_path_cost_size(), but I noticed that that is wrong;

I was wrong here; we don't need to consider pushing down the final sort
in that case, as mentioned above, so we could remove that code. Sorry
for the confusion.

>> I moved the above
>> code to the place we get cached costs, which I hope makes
>> estimate_path_cost_size() a bit more readable.

I moved that code from the UPPERREL_ORDERED patch to the UPPERREL_FINAL
patch, because we still need that code for estimating the costs of a
foreign path created in add_foreign_final_paths() defined in the latter
patch.

I polished the patches; revised code, added some more regression tests,
and tweaked comments further.

Attached is an updated version of the patch set.

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v7-0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely.patch text/x-patch 45.2 KB
v7-0002-Refactor-create_limit_path-to-share-cost-adjustment-.patch text/x-patch 4.8 KB
v7-0003-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely.patch text/x-patch 110.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-20 11:51:08 Re: Should we add GUCs to allow partition pruning to be disabled?
Previous Message David Rowley 2019-03-20 11:43:26 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name