From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihuifan1213(at)163(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: make add_paths_to_append_rel aware of startup cost |
Date: | 2024-02-15 12:09:37 |
Message-ID: | CAApHDvogv5MBw6JV82Yb6gAa0mp64HQtX--TSK_wZOn9x2fP4Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 15 Feb 2024 at 21:42, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> I found the both plans have the same cost, I can't get the accurate
> cause of this after some hours research, but it is pretty similar with
> 7516056c584e3, so I uses a similar strategy to stable it. is it
> acceptable?
It's pretty hard to say. I can only guess why this test would be
flapping like this. I see it's happened before on mylodon, so probably
not a cosmic ray. It's not like add_path() chooses a random path when
the costs are the same, so I wondered if something similar is going on
here that was going on that led to f03a9ca4. In particular, see [1].
On master, I set a breakpoint in try_nestloop_path() to break on
"outerrel->relid==1 && innerrel->relid==2". I see the total Nested
Loop cost comes out the same with the join order reversed.
Which is:
-> Nested Loop (cost=0.00..1500915.00 rows=10000 width=4)
Doing the same with your patch applied, I get:
-> Nested Loop (cost=0.00..600925.00 rows=4000 width=4)
and forcing the join order to swap with the debugger, I see:
-> Nested Loop (cost=0.00..600940.00 rows=4000 width=4)
So there's a difference now, but it's quite small. If it was a problem
like we had on [1], then since tenk1 and tenk2 have 345 pages (on my
machine), if relpages is down 1 or 2 pages, we'll likely get more of a
costing difference than 600925 vs 600940.
If I use:
explain
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
union all
(values(1)) limit 1;
I get:
-> Nested Loop (cost=0.00..2415.03 rows=10 width=4)
and with the join order reversed, I get:
-> Nested Loop (cost=0.00..2440.00 rows=10 width=4)
I'd be more happy using this one as percentage-wise, the cost
difference is much larger. I don't quite have the will to go through
proving what the actual problem is here. I think [1] already proved
the relpages problem can (or could) happen.
I checked that the t2.thounsand = 0 query still tests the cheap
startup paths in add_paths_to_append_rel() and it does. If I flip
startup_subpaths_valid to false in the debugger, the plan flips to:
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=470.12..514.00 rows=1 width=4)
-> Append (cost=470.12..952.79 rows=11 width=4)
-> Hash Join (cost=470.12..952.73 rows=10 width=4)
Hash Cond: (t1.tenthous = t2.tenthous)
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000 width=8)
-> Hash (cost=470.00..470.00 rows=10 width=4)
-> Seq Scan on tenk2 t2 (cost=0.00..470.00
rows=10 width=4)
Filter: (thousand = 0)
-> Result (cost=0.00..0.01 rows=1 width=4)
So, if nobody has any better ideas, I'm just going to push the " and
t2.thousand = 0" adjustment.
David
[1] https://www.postgresql.org/message-id/4174.1563239552%40sss.pgh.pa.us
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-02-15 12:11:46 | Re: Add system identifier to backup manifest |
Previous Message | Ashutosh Bapat | 2024-02-15 12:06:16 | Re: Memory consumed by paths during partitionwise join planning |