On Tue, Feb 28, 2012 at 10:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The flaw in this logic, of course, is that the seqscan might be cheaper
> than the parameterized indexscan, but *it produces a whole lot more
> rows*, meaning that any subsequent join will be a lot more expensive.
> Previously add_path didn't have to worry about that, because all
> ordinary paths for a given relation produce the same number of rows
> (and we studiously kept things like inner indexscan paths out of
> add_path's view of the world).
> The most obvious thing to do about this is to consider that one path can
> dominate another on cost only if it is both cheaper *and* produces no
> more rows. But I'm concerned about the cost of inserting yet another
> test condition into add_path, which is slow enough already. Has anyone
> got an idea for a different formulation that would avoid that?
It seems clear that we shouldn't be making that decision at that point.
It would be better to default towards processing fewer rows initially
and then swoop in later to improve decision making on larger plans.
Can't we save the SeqScan costs at every node, then re-add SeqScan
plans as a post-processing step iff the index/nestd loops plans appear
costly? So have an additional post processing step that only cuts in
with larger plans.
Seqscan plans are bad for many reasons, such as pushing data out of
cache, making the result more sensitive to growing data volumes or
selectivity mistakes as well as producing confusing stats for people
trying to add the right indexes.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
In response to
pgsql-hackers by date
|Next:||From: Daniel Farina||Date: 2012-02-29 09:05:01|
|Subject: Re: Re: pg_stat_statements normalisation without invasive
changes to the parser (was: Next steps on pg_stat_statements normalisation)|
|Previous:||From: Shigeru Hanada||Date: 2012-02-29 05:24:43|
|Subject: Re: FDW system columns|