From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: On disable_cost |
Date: | 2024-07-02 14:57:27 |
Message-ID: | 591b3596-2ea0-4b8e-99c6-fad0ef2801f5@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28/06/2024 18:46, Robert Haas wrote:
> On Wed, Jun 12, 2024 at 11:35 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Well, that didn't generate much discussion, but here I am trying
>> again. Here I've got patches 0001 and 0002 from my previous posting;
>> I've dropped 0003 and 0004 from the previous set for now so as not to
>> distract from the main event, but they may still be a good idea.
>> Instead I've got an 0003 and an 0004 that implement the "count of
>> disabled nodes" approach that we have discussed previously. This seems
>> to work fine, unlike the approaches I tried earlier. I think this is
>> the right direction to go, but I'd like to know what concerns people
>> might have.
>
> Here is a rebased patch set, where I also fixed pgindent damage and a
> couple of small oversights in 0004.
>
> I am hoping to get these committed some time in July. So if somebody
> thinks that's too soon or thinks it shouldn't happen at all, please
> don't wait too long to let me know about that.
v3-0001-Remove-grotty-use-of-disable_cost-for-TID-scan-pl.patch:
+1, this seems ready for commit
v3-0002-Rationalize-behavior-of-enable_indexscan-and-enab.patch:
I fear this will break people's applications, if they are currently
forcing a sequential scan with "set enable_indexscan=off". Now they will
need to do "set enable_indexscan=off; set enable_indexonlyscan=off" for
the same effect. Maybe it's acceptable, disabling sequential scans to
force an index scan is much more common than the other way round.
v3-0003-Treat-number-of-disabled-nodes-in-a-path-as-a-sep.patch:
> @@ -1318,6 +1342,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
> startup_cost += path->pathtarget->cost.startup;
> run_cost += path->pathtarget->cost.per_tuple * path->rows;
>
> + /*
> + * There are assertions above verifying that we only reach this function
> + * either when enable_tidscan=true or when the TID scan is the only legal
> + * path, so it's safe to set disabled_nodes to zero here.
> + */
> + path->disabled_nodes = 0;
> path->startup_cost = startup_cost;
> path->total_cost = startup_cost + run_cost;
> }
So if you have enable_tidscan=off, and have a query with "WHERE CURRENT
OF foo" that is planned with a TID scan, we set disable_nodes = 0? That
sounds wrong, shouldn't disable_nodes be 1 in that case? It probably
cannot affect the rest of the plan, given that "WHERE CURRENT OF" is
only valid in an UPDATE or DELETE, but still. At least it deserves a
better explanation in the comment.
> diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
> index 6b64c4a362..20236e8c4d 100644
> --- a/src/backend/optimizer/plan/createplan.c
> +++ b/src/backend/optimizer/plan/createplan.c
> @@ -25,6 +25,7 @@
> #include "nodes/extensible.h"
> #include "nodes/makefuncs.h"
> #include "nodes/nodeFuncs.h"
> +#include "nodes/print.h"
> #include "optimizer/clauses.h"
> #include "optimizer/cost.h"
> #include "optimizer/optimizer.h"
left over from debugging?
> @@ -68,6 +68,15 @@ static bool pathlist_is_reparameterizable_by_child(List *pathlist,
> int
> compare_path_costs(Path *path1, Path *path2, CostSelector criterion)
> {
> + /* Number of disabled nodes, if different, trumps all else. */
> + if (unlikely(path1->disabled_nodes != path2->disabled_nodes))
> + {
> + if (path1->disabled_nodes < path2->disabled_nodes)
> + return -1;
> + else
> + return +1;
> + }
> +
> if (criterion == STARTUP_COST)
> {
> if (path1->startup_cost < path2->startup_cost)
Is "unlikely()" really appropriate here (and elsewhere in the patch)? If
you run with enable_seqscan=off but have no indexes, you could take that
path pretty often.
If this function needs optimizing, I'd suggest splitting it into two
functions, one for comparing the startup cost and another for the total
cost. Almost all callers pass a constant for that argument, so they
might as well call the correct function directly and avoid the branch
for that.
> @@ -658,6 +704,20 @@ add_path_precheck(RelOptInfo *parent_rel,
> Path *old_path = (Path *) lfirst(p1);
> PathKeysComparison keyscmp;
>
> + /*
> + * Since the pathlist is sorted by disabled_nodes and then by
> + * total_cost, we can stop looking once we reach a path with more
> + * disabled nodes, or the same number of disabled nodes plus a
> + * total_cost larger than the new path's.
> + */
> + if (unlikely(old_path->disabled_nodes != disabled_nodes))
> + {
> + if (disabled_nodes < old_path->disabled_nodes)
> + break;
> + }
> + else if (total_cost <= old_path->total_cost * STD_FUZZ_FACTOR)
> + break;
> +
> /*
> * We are looking for an old_path with the same parameterization (and
> * by assumption the same rowcount) that dominates the new path on
> @@ -666,39 +726,27 @@ add_path_precheck(RelOptInfo *parent_rel,
> *
> * Cost comparisons here should match compare_path_costs_fuzzily.
> */
> - if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
> + /* new path can win on startup cost only if consider_startup */
> + if (startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR ||
> + !consider_startup)
> {
The "Cost comparisons here should match compare_path_costs_fuzzily"
comment also applies to the check on total_cost that you moved up. Maybe
move up the comment to the beginning of the loop.
v3-0004-Show-number-of-disabled-nodes-in-EXPLAIN-ANALYZE-.patch:
It's surprising that the "Disable Nodes" is printed even with the COSTS
OFF option. It's handy for our regression tests, it's good to print them
there, but it feels wrong.
Could we cram it into the "cost=... rows=..." part? And perhaps a marker
that a node was disabled would be more user friendly than showing the
cumulative count? Something like:
postgres=# set enable_material=off;
SET
postgres=# set enable_seqscan=off;
SET
postgres=# set enable_bitmapscan=off;
SET
postgres=# explain select * from foo, bar;
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=0.15..155632.40 rows=6502500 width=8)
-> Index Only Scan using foo_i_idx on foo (cost=0.15..82.41
rows=2550 width=4)
-> Seq Scan on bar (cost=0.00..35.50 (disabled) rows=2550 width=4)
(5 rows)
--
Heikki Linnakangas
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2024-07-02 15:11:22 | Re: Cleaning up perl code |
Previous Message | Alvaro Herrera | 2024-07-02 14:55:01 | Re: LogwrtResult contended spinlock |