Re: match_unsorted_outer() vs. cost_nestloop()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: match_unsorted_outer() vs. cost_nestloop()
Date: 2009-09-06 00:19:19
Message-ID: 10464.1252196359@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I guess my point is that for node types that dump their output into a
> tuplestore anyway, it doesn't seem like cost_nestloop() should charge
> n * the startup cost. I believe that at least function, CTE, and
> worktable scans fall into this category. No?

Yeah, probably. The comment is correct as is:

* their sum. What's not so clear is whether the inner path's
* startup_cost must be paid again on each rescan of the inner path. This
* is not true if the inner path is materialized or is a hashjoin, but
* probably is true otherwise.

What's not correct is the code's expansion of "is materialized" as
"is a MaterialPath". However, I'm not sure it's worth just adding
these other tuplestore-using types to the list. We really ought
to think a bit harder about representing the difference between
initial scan cost and rescan cost.

It might be sufficient to have cost_nestloop just hardwire the knowledge
that certain inner path types have a different behavior here --- that
is, for a rescan there is zero start cost and some very low per-tuple
cost, independent of the path's nominal cost values (which would now
be defined as always the costs for the first scan). And maybe the same
in cost_mergejoin. Offhand I don't think anyplace else really needs to
think about rescan costs.

I think this would be enough to deal with the issue for those plan types
that materialize their output, because they all have about the same
runtime behavior in this regard. What gets more exciting is if you'd
like to model other effects this way --- for example, the one that
rescanning an indexscan is probably a lot cheaper than the original
fetch because of caching effects. But we already have that sort of
thing accounted for (to some extent anyway) elsewhere, so I think we
can probably ignore it here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-09-06 00:39:13 Re: match_unsorted_outer() vs. cost_nestloop()
Previous Message Tom Lane 2009-09-05 22:22:43 Re: Tightening binary receive functions