Re: -HEAD planner issue wrt hash_joins on dbt3 ?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Matteo Beccati <php(at)beccati(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date: 2006-09-26 21:53:28
Message-ID: 570.1159307608@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Tom Lane wrote:
>> It evidently thinks that most of the rows in the join of part and
>> partsupp won't have any matching rows in lineitem, whereas on average
>> there are about 7 matching rows apiece. So that's totally wacko, and
>> it's not immediately obvious why. Could we see the pg_stats entries for
>> part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
>> lineitem.l_partkey, lineitem.l_suppkey?

> http://www.kaltenbrunner.cc/files/dbt3_beta1_statistics.txt

OK, so we have 2 million parts and 100000 suppliers, and ANALYZE doesn't
seem to have been too far off at estimating either of those numbers.
I think the problem is that there are not very many suppliers for any
particular part, and thus the condition "part match AND supplier match"
is really not much more selective than "part match" alone. The planner
is supposing that their selectivities are independent, which they
aren't.

Offhand I don't see any good way to fix this without multi-column
statistics, which is something that's certainly not happening for 8.2 :-(

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-09-26 22:27:09 Isn't strdup.h useless code?
Previous Message Alvaro Herrera 2006-09-26 21:34:17 Re: Constant changes (Re-Build)