Re: make add_paths_to_append_rel aware of startup cost

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: make add_paths_to_append_rel aware of startup cost
Date: 2024-02-15 12:09:37
Message-ID: CAApHDvogv5MBw6JV82Yb6gAa0mp64HQtX--TSK_wZOn9x2fP4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 15 Feb 2024 at 21:42, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> I found the both plans have the same cost, I can't get the accurate
> cause of this after some hours research, but it is pretty similar with
> 7516056c584e3, so I uses a similar strategy to stable it. is it
> acceptable?

It's pretty hard to say. I can only guess why this test would be
flapping like this. I see it's happened before on mylodon, so probably
not a cosmic ray. It's not like add_path() chooses a random path when
the costs are the same, so I wondered if something similar is going on
here that was going on that led to f03a9ca4. In particular, see [1].

On master, I set a breakpoint in try_nestloop_path() to break on
"outerrel->relid==1 && innerrel->relid==2". I see the total Nested
Loop cost comes out the same with the join order reversed.

Which is:

-> Nested Loop (cost=0.00..1500915.00 rows=10000 width=4)

Doing the same with your patch applied, I get:

-> Nested Loop (cost=0.00..600925.00 rows=4000 width=4)

and forcing the join order to swap with the debugger, I see:

-> Nested Loop (cost=0.00..600940.00 rows=4000 width=4)

So there's a difference now, but it's quite small. If it was a problem
like we had on [1], then since tenk1 and tenk2 have 345 pages (on my
machine), if relpages is down 1 or 2 pages, we'll likely get more of a
costing difference than 600925 vs 600940.

If I use:

explain
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
union all
(values(1)) limit 1;

I get:

-> Nested Loop (cost=0.00..2415.03 rows=10 width=4)

and with the join order reversed, I get:

-> Nested Loop (cost=0.00..2440.00 rows=10 width=4)

I'd be more happy using this one as percentage-wise, the cost
difference is much larger. I don't quite have the will to go through
proving what the actual problem is here. I think [1] already proved
the relpages problem can (or could) happen.

I checked that the t2.thounsand = 0 query still tests the cheap
startup paths in add_paths_to_append_rel() and it does. If I flip
startup_subpaths_valid to false in the debugger, the plan flips to:

QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=470.12..514.00 rows=1 width=4)
-> Append (cost=470.12..952.79 rows=11 width=4)
-> Hash Join (cost=470.12..952.73 rows=10 width=4)
Hash Cond: (t1.tenthous = t2.tenthous)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000 width=8)
-> Hash (cost=470.00..470.00 rows=10 width=4)
-> Seq Scan on tenk2 t2 (cost=0.00..470.00
rows=10 width=4)
Filter: (thousand = 0)
-> Result (cost=0.00..0.01 rows=1 width=4)

So, if nobody has any better ideas, I'm just going to push the " and
t2.thousand = 0" adjustment.

David

[1] https://www.postgresql.org/message-id/4174.1563239552%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-02-15 12:11:46 Re: Add system identifier to backup manifest
Previous Message Ashutosh Bapat 2024-02-15 12:06:16 Re: Memory consumed by paths during partitionwise join planning