Quick Links

Re: Optimizer(?) off by factor of 3 ... ?

From:	"Marc G(dot) Fournier" <scrappy(at)hub(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimizer(?) off by factor of 3 ... ?
Date:	2002-02-13 13:16:15
Message-ID:	20020213091531.F19107-100000@mail1.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Okay, I've 'saved' the dataset/schema for this if you want me to test/try
anything further with it, as I'm going to try and redo teh schema to get
around the issues for now ...

On Mon, 11 Feb 2002, Tom Lane wrote:

> "Marc G. Fournier" <scrappy(at)hub(dot)org> writes:
> > [ bogus optimizer choices in 7.2 ]
>
> Well, I guess the good news is we seem to be past the old bugaboo of bad
> statistics: the estimated row counts are all in the right ballpark. Now
> we get to have fun with the cost models :-).
>
> It looks to me like there are a couple of problems here. One is that
> the default value of effective_cache_size is way too small --- it's set
> at 1000, which is probably silly when you have NBuffers set to 32768.
> (In hindsight maybe we should have expressed it as a multiple of
> NBuffers rather than an absolute size.) You could tweak that with a
> postgresql.conf change, but I'm not sure that that alone will help much.
>
> The more difficult issue is that nestloops with inner indexscan are
> being seriously misestimated. We're computing the cost as though each
> iteration of the inner scan were completely independent and being done
> from a standing start --- which is wrong, because in practice scans
> after the first will tend to find buffer cache hits for pages already
> read in by prior scans. You can bet, for example, that the btree
> metapage and root page aren't going to need to be re-read on each
> iteration.
>
> I am thinking that the right way to do this is to cost the entire inner
> indexscan (all iterations put together) as if it were a single
> indexscan, at least for the purposes of applying the Mackert & Lohman
> formula embedded in cost_index. That would give us a more realistic
> result for the total cost of the main-table accesses driven by the
> index. Not sure how to adjust the cost estimate for reading the index,
> but clearly we need to make some adjustment for repeated hits on the
> upper index pages.
>
> This is probably a bigger change than we can hope to make in 7.2.* ...
>
> BTW, what do you get if you EXPLAIN ANALYZE that orient/clubs join
> with seqscan enabled and hashjoin disabled? If it's a mergejoin,
> how about if you also disable mergejoin? It seems to me that a seqscan
> on clubs would be a much better way to do the nestloop join than an
> indexscan --- but it's being forced into an indexscan because you
> disabled seqscan.
>
> regards, tom lane
>

In response to

Re: Optimizer(?) off by factor of 3 ... ? at 2002-02-12 03:27:58 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Marc G. Fournier	2002-02-13 13:31:30	pg_stats explained ... ?
Previous Message	Martín Marqués	2002-02-13 13:09:20	Re: benchmarking postgres