Re: Problems with plan estimates in postgres_fdw

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problems with plan estimates in postgres_fdw
Date: 2019-02-25 10:31:31
Message-ID: 5C73C403.7080109@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/02/22 22:54), Antonin Houska wrote:
> Etsuro Fujita<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Maybe, my explanation in that thread was not enough, but the changes proposed
>> by the patch posted there wouldn't obsolete the above. Let me explain using a
>> foreign-table variant of the example shown in the comments for
>> make_group_input_target():
>>
>> SELECT a+b, sum(c+d) FROM foreign_table GROUP BY a+b;
>>
>> When called from postgresGetForeignPaths(), the reltarget for the base
>> relation foreign_table would be {a, b, c, d}, for the same reason as the LIMIT
>> example in [1], and the foreign_table's rel_startup_cost and rel_total_cost
>> would be calculated based on this reltarget in that callback routine. (The
>> tlist eval costs would be zeroes, though). BUT: before called from
>> postgresGetForeignUpperPaths() with the UPPERREL_GROUP_AGG stage, the
>> reltarget would be replaced with {a+b, c, d}, which is the final scan target
>> as explained in the comments for make_group_input_target(), and the eval costs
>> of the new reltarget wouldn't be zeros, so the costs of scanning the
>> foreign_table would need to be adjusted to include the eval costs. That's why
>> I propose the assignments in estimate_path_cost_size() shown above.
>
> Yes, apply_scanjoin_target_to_paths() can add some more costs, including those
> introduced by make_group_input_target(), which postgres_fdw needs to account
> for. This is what you fix in
>
> https://www.postgresql.org/message-id/5C66A056.60007%40lab.ntt.co.jp

That's right. (I'll post a new version of the patch to address your
comments later.)

>> On the other hand, the code bit added by
>> 0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch handles the
>> case where a post-scan/join processing step other than grouping/aggregation
>> (ie, the final sort or LIMIT/OFFSET) is performed directly on a base or join
>> relation, as in the LIMIT example. So, the two changes are handling different
>> cases, hence both changes would be required.
>
> Do you mean this part?

No, I don't. What I meant was this part:

+ /*
+ * If this is an UPPERREL_ORDERED step performed on the final
+ * scan/join relation, the costs obtained from the cache
wouldn't yet
+ * contain the eval costs for the final scan/join target, which
would
+ * have been updated by apply_scanjoin_target_to_paths(); add
the eval
+ * costs now.
+ */
+ if (fpextra && !IS_UPPER_REL(foreignrel))
+ {
+ /* The costs should have been obtained from the cache. */
+ Assert(fpinfo->rel_startup_cost >= 0 &&
+ fpinfo->rel_total_cost >= 0);
+
+ startup_cost += foreignrel->reltarget->cost.startup;
+ run_cost += foreignrel->reltarget->cost.per_tuple * rows;
+ }

The case handled by this would be different from one handled by the fix,
but as I said in the nearby thread, this part might be completely
redundant. Will re-consider about this.

> + /*
> + * If this includes an UPPERREL_ORDERED step, the given target, which
> + * would be the final target to be applied to the resulting path, might
> + * have different expressions from the underlying relation's reltarget
> + * (see make_sort_input_target()); adjust tlist eval costs.
> + */
> + if (fpextra&& fpextra->target != foreignrel->reltarget)
> + {
> + QualCost oldcost = foreignrel->reltarget->cost;
> + QualCost newcost = fpextra->target->cost;
> +
> + startup_cost += newcost.startup - oldcost.startup;
> + total_cost += newcost.startup - oldcost.startup;
> + total_cost += (newcost.per_tuple - oldcost.per_tuple) * rows;
> + }
>
> You should not need this. Consider what grouping_planner() does if
> (!have_grouping&& !activeWindows&& parse->sortClause != NIL):
>
> sort_input_target = make_sort_input_target(...);
> ...
> grouping_target = sort_input_target;
> ...
> scanjoin_target = grouping_target;
> ...
> apply_scanjoin_target_to_paths(...);
>
> So apply_scanjoin_target_to_paths() effectively accounts for the additional
> costs introduced by make_sort_input_target().

Yeah, but I think the code bit cited above is needed. Let me explain
using yet another example with grouping and ordering:

SELECT a+b, random() FROM foreign_table GROUP BY a+b ORDER BY a+b;

For this query, the sort_input_target would be {a+b}, (to which the
foreignrel->reltarget would have been set when called from
estimate_path_cost_size() called from postgresGetForeignUpperPaths()
with the UPPERREL_ORDERED stage), but the final target would be {a+b,
random()}, so the sort_input_target isn't the same as the final target
in that case; the costs needs to be adjusted so that the costs include
the ones of generating the final target. That's the reason why I added
the code bit you cited.

> Note about the !activeWindows test: It occurs me now that no paths should be
> added to UPPERREL_ORDERED relation if the query needs WindowAgg node, because
> postgres_fdw currently cannot handle UPPERREL_WINDOW relation. Or rather in
> general, the FDW should not skip any stage just because it doesn't know how to
> build the paths.

That's right. In postgres_fdw, by the following code in
postgresGetForeignUpperPaths(), we will give up on doing the
UPPERREL_ORDERED step remotely, if the query has the UPPERREL_WINDOW
(and/or UPPERREL_DISTINCT) steps, because in that case we would have
input_rel->fdw_private=NULL for a call to postgresGetForeignUpperPaths()
with the UPPERREL_ORDERED stage:

/*
* If input rel is not safe to pushdown, then simply return as we
cannot
* perform any post-join operations on the foreign server.
*/
if (!input_rel->fdw_private ||
!((PgFdwRelationInfo *) input_rel->fdw_private)->pushdown_safe)
return;

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2019-02-25 10:57:09 Re: Remove Deprecated Exclusive Backup Mode
Previous Message Masahiko Sawada 2019-02-25 10:17:02 Re: reloption to prevent VACUUM from truncating empty pages at the end of relation