Re: osdl-dbt3 run results - puzzled by the execution

From: Jenny Zhang <jenny(at)osdl(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: perf-pgsql <pgsql-performance(at)postgresql(dot)org>
Subject: Re: osdl-dbt3 run results - puzzled by the execution
Date: 2003-09-19 21:35:41
Message-ID: 1064007341.442.45.camel@ibm-a.pdx.osdl.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, 2003-09-18 at 20:20, Tom Lane wrote:
> Jenny Zhang <jenny(at)osdl(dot)org> writes:
> > ... It seems to me that small
> > effective_cache_size favors the choice of nested loop joins (NLJ)
> > while the big effective_cache_size is in favor of merge joins (MJ).
>
> No, I wouldn't think that, because a nestloop plan will involve repeated
> fetches of the same tuples whereas a merge join doesn't (at least not
> when it sorts its inner input, as this plan does). Larger cache
> improves the odds of a repeated fetch not having to do I/O. In practice
> a larger cache area would also have some effects on access costs for the
> sort's temp file, but I don't think the planner's cost model for sorting
> takes that into account.
I think there is some misunderstanding here. What I meant to say is:
>From the plans we got, the optimizer favors the choice of nested loop
joins (NLJ) while the big effective_cache_size is in favor of merge
joins (MJ). Which we think is not appropriate. We verified that
sort_mem has no impact on the plans. Though it would be nice to take
that into account.
>
> As Matt Clark points out nearby, the real question is whether these
> planner estimates have anything to do with reality. EXPLAIN ANALYZE
> results would be far more interesting than plain EXPLAIN.
>
> > However, within the same run set consist of 6 runs, we see 2-3%
> > standard deviation for the run metrics associated with the multiple
> > stream part of the test (as opposed to the single stream part).
>
> <python> Och, laddie, we useta *dream* of 2-3% variation </python>
>
BTW, I am a she :-)
> > We would like to reduce the variation to be less than 1% so that a
> > 2% change between two different kernels would be significant.
>
> I think this is a pipe dream. Variation in where the data gets laid
> down on your disk drive would alone create more than that kind of delta.
> I'm frankly amazed you could get repeatability within 2-3%.
>
Greg is right. The repeatability is due to the aggregate results for a
whole test run. As for individual query, the power test(single stream)
is very consistent, and the throughput test(multiple streams), any given
query execution time varies up to 15% if no swapping. If we set
sort_mem too high and swapping occurs, the variation is bigger.

Jenny

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Manfred Spraul 2003-09-19 21:39:53 semtimedop instead of setitimer/semop/setitimer
Previous Message Robert Treat 2003-09-19 21:26:54 Re: NuSphere and PostgreSQL for windows

Browse pgsql-performance by date

  From Date Subject
Next Message Jenny Zhang 2003-09-19 23:26:54 Re: osdl-dbt3 run results - puzzled by the execution
Previous Message Jenny Zhang 2003-09-19 18:35:35 Re: osdl-dbt3 run results - puzzled by the execution