Re: Startup cost of sequential scan

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Startup cost of sequential scan
Date: 2018-08-30 15:38:04
Message-ID: CAPpHfdtSL=WSUkmAqLMDyBzKguqwb3vfWPs=c6NwREyqijZ=bQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 30, 2018 at 5:58 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > On Thu, Aug 30, 2018 at 5:05 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Because it's what the mental model of startup cost says it should be.
>
> > From this model we make a conclusion that we're starting getting rows
> > from sequential scan sooner than from index scan. And this conclusion
> > doesn't reflect reality.
>
> No, startup cost is not the "time to find the first row". It's overhead
> paid before you even get to start examining rows.
>
> I'm disinclined to consider fundamental changes to our costing model
> on the basis of this example. The fact that the rowcount estimates are
> so far off reality means that you're basically looking at "garbage in,
> garbage out" for the cost calculations --- and applying a small LIMIT
> just magnifies that.
>
> It'd be more useful to think first about how to make the selectivity
> estimates better; after that, we might or might not still think there's
> a costing issue.

I understand that startup cost is not "time to find the first row".
But I think this example highlight not one but two issues.
1) Row count estimates for joins are wrong.
2) Rows are assumed to be continuous while in reality they are
discrete. So, if we reverse the assumptions made in LIMIT clause
estimation, we may say that it's basically assuming that we need to
fetch only fraction of row from the sequential scan node. And in the
case we really fetch 101 rows in each join with t2, this logic would
still bring us to the bad plan. And now I'm not proposing go rush
redesigning planner to fix that. I just think it's probably something
worth discussion.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-30 15:55:27 Re: Startup cost of sequential scan
Previous Message Tom Lane 2018-08-30 15:33:19 Re: Startup cost of sequential scan