match_unsorted_outer() vs. cost_nestloop()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: match_unsorted_outer() vs. cost_nestloop()
Date: 2009-09-05 01:02:10
Message-ID: 603c8f070909041802p18ed2fb1v91245ccfb5c2a24a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In joinpath.c, match_unsorted_outer() considers materializing the
inner side of each nested loop if the inner path is not an index scan,
bitmap heap scan, tid scan, material path, function scan, CTE scan, or
worktable scan. In costsize.c, cost_nestloop() charges the startup
cost only once if the inner path is a hash path or material path;
otherwise, it charges it for every anticipated rescan.

It seems to me, perhaps naively, like the criteria used in these two
places are more different than they maybe should be. For example,
function scan nodes insert their results into a tuplestore so that
rescans get the same set of tuples, which is why we don't consider
inserting a materialize node over them in match_unsorted_outer() - but
I think that also means that we oughtn't to be counting the startup
cost for every rescan.

I'm not exactly sure which ones should match or not match. Hash
paths, maybe, shouldn't. I believe the reason why we don't count the
startup cost of the hash path over again is because we're assuming
that it's attributable to the cost of building the hash table, which
only needs to be done once. I don't think that's 100% accurate
because the hash path could have inherited some of that cost from its
underlying paths. At any rate, it's conceivable that materializing
could be enough cheaper than repeating the join that a materialize
nodes makes sense.

Thoughts?

...Robert

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-09-05 01:28:17 Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Previous Message Tom Lane 2009-09-04 23:33:08 Re: Non-Solaris dtrace support is disabled in 8.4!!!?