Re: Analyzer is clueless

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: David Brown <time(at)bigpond(dot)net(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Analyzer is clueless
Date: 2004-11-17 21:02:59
Message-ID: 419BBC83.7020802@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

Have you tried increasing the statistics target for orderdate and
rerunning analyze?

Sincerely,

Joshua D. Drake

David Brown wrote:
> I'm doing some performance profiling with a simple two-table query:
>
> SELECT L."ProductID", sum(L."Amount")
> FROM "drinv" H
> JOIN "drinvln" L ON L."OrderNo" = H."OrderNo"
> WHERE
> ("OrderDate" between '2003-01-01' AND '2003-04-30')
> GROUP BY L."ProductID"
>
> drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB. OrderNo is indexed in both tables, as is OrderDate.
>
> The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking various conf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer doesn't seem to take resources into account like some of the doco suggests.
>
> The date selection represents about 5% of the range. Here's the plan summaries:
>
> Three months (2003-01-01 to 2003-03-30) = 1 second
>
> HashAggregate (cost=119365.53..119368.74 rows=642 width=26)
> -> Nested Loop (cost=0.00..118791.66 rows=114774 width=26)
> -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..200.27 rows=3142 width=8)
> Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date))
> -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..28.73 rows=721 width=34)
> Index Cond: (l."OrderNo" = "outer"."OrderNo")
>
>
> Four months (2003-01-01 to 2003-04-30) = 60 seconds
>
> HashAggregate (cost=126110.53..126113.74 rows=642 width=26)
> -> Hash Join (cost=277.55..125344.88 rows=153130 width=26)
> Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo")
> -> Seq Scan on drinvln l (cost=0.00..106671.35 rows=3372935 width=34)
> -> Hash (cost=267.07..267.07 rows=4192 width=8)
> -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8)
> Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))
>
>
> Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds
>
>
> HashAggregate (cost=130565.83..130569.04 rows=642 width=26)
> -> Merge Join (cost=519.29..129800.18 rows=153130 width=26)
> Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo")
> -> Sort (cost=519.29..529.77 rows=4192 width=8)
> Sort Key: h."OrderNo"
> -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8)
> Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date))
> -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..119296.29 rows=3372935 width=34)
>
> Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see, the estimated costs are wild anyway.
>
> As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.
>
> MySQL (InnoDB) took 2 seconds, which is 30 times faster.
>
> The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment Content-Type Size
jd.vcf text/x-vcard 640 bytes

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-11-17 21:07:06 Re: memcached and PostgreSQL
Previous Message Michael Adler 2004-11-17 19:51:58 Re: memcached and PostgreSQL