Re: Hash or merge join instead of inner loop

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash or merge join instead of inner loop
Date: 2003-06-10 19:42:07
Message-ID: 20030610194206.GK40542@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jun 10, 2003 at 02:15:11AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > I have a query that's cauing pgsql choose either a hash or merge join
> > depending on how I mess with the stats variables, but it won't choose an
> > nested loop, even though it's the fastest.
>
> There's been some discussion about that before; you could check the
> archives (now that they're up again ;-)). I believe that the planner
> overestimates the cost of a nestloop with inner indexscan, because it
> costs the indexscans as though each one is an independent ab-initio
> index search. In reality, most of the upper btree levels will no doubt
> stay in memory during such a query, and so this estimate charges many
> more reads than really occur. Fixing this is on the todo list, but no
> one's got to it yet. (It's not clear to me how to put the consideration
> into the planner's cost algorithms in a clean way.)

What about just ignoring all but the leaf pages? Unless you have a
really, really big index, I think this would probably work well, or at
least better than what we have right now.

I can't think of an elegant way to figure out hit percentages either.
Maybe as a ratio of how often an individual page at a given level of the
btree is to be hit? IE: the root page will always be hit (only one
page); if the next level up has 10 pages, each one is 10% likely to be
in cache, and so-on. Or maybe a better way to look at it is how many
pages sit underneath each page. So if we figure there's a 0.1% chance that
a leaf page is in cache and each page in the layer above/below that has
tuples for 100 leaf pages, then the odds of a page in that layer being
in the cache is 10%

It might also be worth giving index pages a higher priority in the
internal buffer than table pages.
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-06-10 19:43:47 Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning
Previous Message Bruce Momjian 2003-06-10 19:08:22 Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning