Re: osdl-dbt3 run results - puzzled by the execution

From: Jenny Zhang <jenny(at)osdl(dot)org>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: perf-pgsql <pgsql-performance(at)postgresql(dot)org>, Matt Clark <matt(at)ymogen(dot)net>
Subject: Re: osdl-dbt3 run results - puzzled by the execution
Date: 2003-09-19 18:35:35
Message-ID: 1063996535.32392.9.camel@ibm-a.pdx.osdl.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I posted more results as you requested:

On Fri, 2003-09-19 at 08:08, Manfred Koizar wrote:
> On Thu, 18 Sep 2003 15:36:50 -0700, Jenny Zhang <jenny(at)osdl(dot)org>
> wrote:
> >We thought the large effective_cache_size should lead us to better
> >plans. But we found the opposite.
>
> The common structure of your query plans is:
>
> Sort
> Sort Key: sum((partsupp.ps_supplycost * partsupp.ps_availqty))
> InitPlan
> -> Aggregate
> -> SubPlan
> -> Aggregate
> Filter: (sum((ps_supplycost * ps_availqty)) > $0)
> -> Group
> -> Sort
> Sort Key: partsupp.ps_partkey
> -> SubPlan (same as above)
>
> where the SubPlan is
>
> -> Merge Join (cost=519.60..99880.05 rows=32068 width=65)
> (actual time=114.78..17435.28 rows=30400 loops=1)
> ctr=5.73
> Merge Cond: ("outer".ps_suppkey = "inner".s_suppkey)
> -> Index Scan using i_ps_suppkey on partsupp
> (cost=0.00..96953.31 rows=801712 width=34)
> (actual time=0.42..14008.92 rows=799361 loops=1)
> ctr=6.92
> -> Sort (cost=519.60..520.60 rows=400 width=31)
> (actual time=106.88..143.49 rows=30321 loops=1)
> ctr=3.63
> Sort Key: supplier.s_suppkey
> -> SubSubPlan
>
> for large effective_cache_size and
>
> -> Nested Loop (cost=0.00..130168.30 rows=32068 width=65)
> (actual time=0.56..1374.41 rows=30400 loops=1)
> ctr=94.71
> -> SubSubPlan
> -> Index Scan using i_ps_suppkey on partsupp
> (cost=0.00..323.16 rows=80 width=34)
> (actual time=0.16..2.98 rows=80 loops=380)
> ctr=108.44
> Index Cond: (partsupp.ps_suppkey = "outer".s_suppkey)
>
> for small effective_cache_size. Both subplans have an almost
> identical subsubplan:
>
> -> Nested Loop (cost=0.00..502.31 rows=400 width=31)
> (actual time=0.23..110.51 rows=380 loops=1)
> ctr=4.55
> Join Filter: ("inner".s_nationkey = "outer".n_nationkey)
> -> Seq Scan on nation (cost=0.00..1.31 rows=1 width=10)
> (actual time=0.08..0.14 rows=1 loops=1)
> ctr=9.36
> Filter: (n_name = 'ETHIOPIA'::bpchar)
> -> Seq Scan on supplier (cost=0.00..376.00 rows=10000 width=21)
> (actual time=0.10..70.72 rows=10000 loops=1)
> ctr=5.32
>
> I have added the ctr (cost:time ratio) for each plan node. These
> values are mostly between 5 and 10 with two notable exceptions:
>
> 1) -> Sort (cost=519.60..520.60 rows=400 width=31)
> (actual time=106.88..143.49 rows=30321 loops=1)
> ctr=3.63
>
> It has already been noticed by Matt Clark that this is the only plan
> node where the row count estimation looks wrong. However, I don't
> believe that this has great influence on the total cost of the plan,
> because the ctr is not far from the usual range and if it were a bit
> higher, it would only add a few hundred cost units to a branch costing
> almost 100000 units. BTW I vaguely remember that there is something
> strange with the way actual rows are counted inside a merge join.
> Look at the branch below this plan node: It shows an actual row count
> of 380.
>
> 2) -> Index Scan using i_ps_suppkey on partsupp
> (cost=0.00..323.16 rows=80 width=34)
> (actual time=0.16..2.98 rows=80 loops=380)
> ctr=108.44
>
> Here we have the only plan node where loops > 1, and it is the only
> one where the ctr is far off. The planner computes the cost for one
> loop and multiplies it by the number of loops (which it estimates
> quite accurately to be 400), thus getting a total cost of ca. 130000.
> We have no reason to believe that the single loop cost is very far
> from reality (for a *single* index scan), but the planner does not
> account for additional index scans hitting pages in the cache that
> have been brought in by preceding scans. This is a known problem, Tom
> has mentioned it several times, IIRC.
>
> Now I'm very interested in getting a better understanding of this
> problem, so could you please report the results of
>
> . \d i_ps_suppkey
>
http://developer.osdl.org/~jenny/pgsql-optimizer/disc_i_ps_suppkey
> . VACUUM VERBOSE ANALYSE partsupp;
> VACUUM VERBOSE ANALYSE supplier;
>
http://developer.osdl.org/~jenny/pgsql-optimizer/vacuum_verbose_analyze_partsupp
http://developer.osdl.org/~jenny/pgsql-optimizer/vacuum_verbose_analyze_suppler
> . SELECT attname, null_frac, avg_witdh, n_distinct, correlation
> FROM pg_stats
> WHERE tablename = 'partsupp' AND attname IN ('ps_suppkey', ...);
>
> Please insert other interesting column names for ..., especially
> those contained in i_ps_suppkey, if any.
>
I put all the related columns
http://developer.osdl.org/~jenny/pgsql-optimizer/info_partsupp_col

> . SELECT relname, relpages, reltuples
> FROM pg_class
> WHERE relname IN ('partsupp', 'supplier', ...);
> ^^^
> Add relevant index names here.
>
I put all the related tables
http://developer.osdl.org/~jenny/pgsql-optimizer/info_table

> . EXPLAIN ANALYSE
> SELECT ps_partkey, ps_supplycost, ps_availqty
> FROM partsupp, supplier
> WHERE ps_suppkey = s_suppkey AND s_nationkey = '<youknowit>';
>
> The idea is to eliminate parts of the plan that are always the same.
> Omitting nation is possibly to much a simplification. In this case
> please re-add it.
> Do this test for small and large effective_cache_size.
> Force the use of other join methods by setting enable_<joinmethod>
> to off. Post all results.
>
http://developer.osdl.org/~jenny/pgsql-optimizer/explain_query_mk
>
> Jenny, I understand that this long message contains more questions
> than answers and is not of much help for you. OTOH your tests might
> be very helpful for Postgres development ...
Let me know if you need anything else

Jenny

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-09-19 19:57:42 pgsql-server/src/backend catalog/index.c comma ...
Previous Message chakkara rangarajan 2003-09-19 16:56:27 Re: change of table name - any help

Browse pgsql-performance by date

  From Date Subject
Next Message Jenny Zhang 2003-09-19 21:35:41 Re: osdl-dbt3 run results - puzzled by the execution
Previous Message Manfred Koizar 2003-09-19 15:08:26 Re: osdl-dbt3 run results - puzzled by the execution plans