Re: Problems with plan estimates in postgres_fdw

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problems with plan estimates in postgres_fdw
Date: 2019-02-19 08:25:30
Message-ID: 5C6BBD7A.1030702@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/02/18 23:21), Antonin Houska wrote:
> Etsuro Fujita<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> (2019/02/15 21:46), Antonin Houska wrote:
>>> ok, I understand now. I assume that the patch
>>>
>>> https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp
>>>
>>> obsoletes the code snippet we discussed above.
>>
>> Sorry, I don't understand this. Could you elaborate on that?
>
> I thought that the assignments
>
> + startup_cost += outerrel->reltarget->cost.startup;
>
> and
>
> + run_cost += outerrel->reltarget->cost.per_tuple * input_rows;
>
> in the patch you posted in
> https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp do
> replace
>
> + startup_cost += foreignrel->reltarget->cost.startup;
>
> and
>
> + run_cost += foreignrel->reltarget->cost.per_tuple * rows;
>
> respectively, which you originally included in the following part of
> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch:
>
> @@ -2829,6 +2872,22 @@ estimate_path_cost_size(PlannerInfo *root,
> run_cost += foreignrel->reltarget->cost.per_tuple * rows;
> }
>
> + /*
> + * If this is an UPPERREL_ORDERED step on the final scan/join
> + * relation, the costs obtained from the cache wouldn't yet contain
> + * the eval cost for the final scan/join target, which would have been
> + * updated by apply_scanjoin_target_to_paths(); add the eval cost now.
> + */
> + if (fpextra&& !IS_UPPER_REL(foreignrel))
> + {
> + /* The costs should have been obtained from the cache. */
> + Assert(fpinfo->rel_startup_cost>= 0&&
> + fpinfo->rel_total_cost>= 0);
> +
> + startup_cost += foreignrel->reltarget->cost.startup;
> + run_cost += foreignrel->reltarget->cost.per_tuple * rows;
> + }
> +

Maybe, my explanation in that thread was not enough, but the changes
proposed by the patch posted there wouldn't obsolete the above. Let me
explain using a foreign-table variant of the example shown in the
comments for make_group_input_target():

SELECT a+b, sum(c+d) FROM foreign_table GROUP BY a+b;

When called from postgresGetForeignPaths(), the reltarget for the base
relation foreign_table would be {a, b, c, d}, for the same reason as the
LIMIT example in [1], and the foreign_table's rel_startup_cost and
rel_total_cost would be calculated based on this reltarget in that
callback routine. (The tlist eval costs would be zeroes, though). BUT:
before called from postgresGetForeignUpperPaths() with the
UPPERREL_GROUP_AGG stage, the reltarget would be replaced with {a+b, c,
d}, which is the final scan target as explained in the comments for
make_group_input_target(), and the eval costs of the new reltarget
wouldn't be zeros, so the costs of scanning the foreign_table would need
to be adjusted to include the eval costs. That's why I propose the
assignments in estimate_path_cost_size() shown above.

On the other hand, the code bit added by
0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
handles the case where a post-scan/join processing step other than
grouping/aggregation (ie, the final sort or LIMIT/OFFSET) is performed
directly on a base or join relation, as in the LIMIT example. So, the
two changes are handling different cases, hence both changes would be
required.

(I think it might be possible that we can merge the two changes into one
by some refactoring of estimate_path_cost_size() or something else, but
I haven't considered that hard yet. Should we do that?)

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5C669DFD.30002%40lab.ntt.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2019-02-19 08:50:31 Re: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message Masahiko Sawada 2019-02-19 08:09:33 Re: Copy function for logical replication slots