Re: Proposal: add a debug message about using geqo

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Jacob Champion <jchampion(at)timescale(dot)com>
Cc: KAWAMOTO Masaya <kawamoto(at)sraoss(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: add a debug message about using geqo
Date: 2022-07-27 23:43:11
Message-ID: CAKFQuwafSDvN-U3uZwLS8tMCBHoXyWTRqtevWO8637e3kihTFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 22, 2022 at 1:20 PM Jacob Champion <jchampion(at)timescale(dot)com>
wrote:

> On Wed, Jun 1, 2022 at 11:09 PM KAWAMOTO Masaya <kawamoto(at)sraoss(dot)co(dot)jp>
> wrote:
> > That sounds a nice idea. But I don't think that postgres shows in the
> > EXPLAIN output why the plan is selected. Would it be appropriate to
> > show that GEQO is used in EXPLAIN output?
>
> I'm reminded of Greenplum's "Optimizer" line in its EXPLAIN output
> [1], so from that perspective I think it's intuitive.
>
> > As a test, I created a patch that add information about GEQO to
> > EXPLAIN output by the GEQO option. The output example is as follows.
> > What do you think about the location and content of information about
> GEQO?

> I am a little surprised to see GeqoDetails being printed for a plan
> that didn't use GEQO, but again that's probably because I'm used to
> GPDB's Optimizer output. And I don't have a lot of personal experience
> using alternative optimizers.
>

I agree this should be part of explain output.

I would not print the current value of geqo_threshold and leave setting
display the exclusive purview of the settings option.

The presentation of only a single geqo result seems incorrect given that
multiple trees can exist. In the first example below the full outer join
causes 3 relations to be seen as a single relation at the top level (hence
max join nodes = 4) while in the inner join case we see all 6 join nodes.
There should be two outputs of GEQO in the first explain, one with join
nodes of 3 and the existing one with 4.

I also don't see the point of labelling them "max"; "join nodes" seems
sufficient.

While it can probably be figured out from the rest of the plan, listing the
names of the join nodes may be useful (and give join nodes some company).

David J.

postgres=# explain (verbose, geqo) with gs2 (v2) as materialized ( select *
from generate_series(1,1) ) select * from gs2 as gs4 full outer join
(select gs2a.v2 from gs2 as gs2a, gs2 as gs2b) as gs5 using (v2),
generate_series(1, 1) as gs (v1) cross join gs2 as gs3 where v1 IN (select
v2 from gs2);
QUERY PLAN

----------------------------------------------------------------------------------------------------
Nested Loop (cost=0.07..0.21 rows=1 width=12)
Output: COALESCE(gs4.v2, gs2a.v2), gs.v1, gs3.v2
CTE gs2
-> Function Scan on pg_catalog.generate_series (cost=0.00..0.01
rows=1 width=4)
Output: generate_series.generate_series
Function Call: generate_series(1, 1)
-> Nested Loop (cost=0.06..0.16 rows=1 width=12)
Output: gs.v1, gs4.v2, gs2a.v2
-> Nested Loop (cost=0.02..0.06 rows=1 width=4)
Output: gs.v1
Join Filter: (gs.v1 = gs2.v2)
-> Function Scan on pg_catalog.generate_series gs
(cost=0.00..0.01 rows=1 width=4)
Output: gs.v1
Function Call: generate_series(1, 1)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4)
Output: gs2.v2
Group Key: gs2.v2
-> CTE Scan on gs2 (cost=0.00..0.02 rows=1 width=4)
Output: gs2.v2
-> Hash Full Join (cost=0.03..0.10 rows=1 width=8)
Output: gs4.v2, gs2a.v2
Hash Cond: (gs2a.v2 = gs4.v2)
-> Nested Loop (cost=0.00..0.05 rows=1 width=4)
Output: gs2a.v2
-> CTE Scan on gs2 gs2b (cost=0.00..0.02 rows=1
width=0)
Output: gs2b.v2
-> CTE Scan on gs2 gs2a (cost=0.00..0.02 rows=1
width=4)
Output: gs2a.v2
-> Hash (cost=0.02..0.02 rows=1 width=4)
Output: gs4.v2
-> CTE Scan on gs2 gs4 (cost=0.00..0.02 rows=1
width=4)
Output: gs4.v2
-> CTE Scan on gs2 gs3 (cost=0.00..0.02 rows=1 width=4)
Output: gs3.v2
GeqoDetails: GEQO: used, geqo_threshold: 2, Max join nodes: 4
(35 rows)

postgres=# explain (verbose, geqo) with gs2 (v2) as materialized ( select *
from generate_series(1,1) ) select * from gs2 as gs4 join (select gs2a.v2
from gs2 as gs2a, gs2 as gs2b) as gs5 using (v2), generate_series(1, 1) as
gs (v1) cross join gs2 as gs3 where v1 IN (select v2 from gs2);
QUERY PLAN

----------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.02..0.18 rows=1 width=12)
Output: gs4.v2, gs.v1, gs3.v2
CTE gs2
-> Function Scan on pg_catalog.generate_series (cost=0.00..0.01
rows=1 width=4)
Output: generate_series.generate_series
Function Call: generate_series(1, 1)
-> Nested Loop (cost=0.00..0.14 rows=1 width=12)
Output: gs.v1, gs4.v2, gs3.v2
-> Nested Loop (cost=0.00..0.11 rows=1 width=8)
Output: gs.v1, gs4.v2
-> Nested Loop Semi Join (cost=0.00..0.04 rows=1 width=4)
Output: gs.v1
Join Filter: (gs.v1 = gs2.v2)
-> Function Scan on pg_catalog.generate_series gs
(cost=0.00..0.01 rows=1 width=4)
Output: gs.v1
Function Call: generate_series(1, 1)
-> CTE Scan on gs2 (cost=0.00..0.02 rows=1 width=4)
Output: gs2.v2
-> Nested Loop (cost=0.00..0.05 rows=1 width=4)
Output: gs4.v2
Join Filter: (gs2a.v2 = gs4.v2)
-> CTE Scan on gs2 gs2a (cost=0.00..0.02 rows=1
width=4)
Output: gs2a.v2
-> CTE Scan on gs2 gs4 (cost=0.00..0.02 rows=1
width=4)
Output: gs4.v2
-> CTE Scan on gs2 gs3 (cost=0.00..0.02 rows=1 width=4)
Output: gs3.v2
-> CTE Scan on gs2 gs2b (cost=0.00..0.02 rows=1 width=0)
Output: gs2b.v2
GeqoDetails: GEQO: used, geqo_threshold: 2, Max join nodes: 6
(30 rows)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-07-28 00:41:20 Re: fairywren hung in pg_basebackup tests
Previous Message Tom Lane 2022-07-27 23:01:00 Re: Unprivileged user can induce crash by using an SUSET param in PGOPTIONS