Dan Harris wrote:
> On Jul 14, 2005, at 7:15 PM, John A Meinel wrote:
>> Is the distribution of your rows uneven? Meaning do you have more rows
>> with a later id than an earlier one?
> There are definitely some id's that will have many times more than the
> others. If I group and count them, the top 10 are fairly dominant in
> the table.
That usually skews the estimates. Since the estimate is more of an
average (unless the statistics are higher).
>> Hmm.. How to do it permanantly? Well you could always issue "set
>> join_collapse set 1; select * from ...."
>> But obviously that isn't what you prefer. :)
>> I think there are things you can do to make merge join more expensive
>> than a nested loop, but I'm not sure what they are.
> Maybe someone else has some ideas to encourage this behavior for future
> work? Setting it on a per-connection basis is doable, but would add
> some burden to us in code.
My biggest question is why the planner things the Nested Loop would be
Have you tuned any of the parameters? It seems like something is out of
whack. (cpu_tuple_cost, random_page_cost, etc...)
>> What I really don't understand is that the estimates dropped as well.
>> The actual number of estimate rows drops to 3k instead of > 1M.
>> The real question is why does the planner think it will be so expensive?
>>> select count(*) from k_b join k_r using (incidentid) where k_b.id=107
>>> and k_r.id=94;
>> Well, this says that they are indeed much more selective.
>> Each one has > 1k rows, but together you end up with only 400.
> Is this a bad thing? Is this not "selective enough" to make it much
Yes, being more selective is what makes it faster. But the planner
doesn't seem to notice it properly.
> Overall, I'm much happier now after seeing the new plan come about, if
> I can find a way to make that join_collapse behavior permanent, I can
> certainly live with these numbers.
I'm sure there are pieces to tune, but I've reached my limits of
parameters to tweak :)
> Thanks again for your continued efforts.
In response to
pgsql-performance by date
|Next:||From: Karim Nassar||Date: 2005-07-15 07:00:58|
|Subject: What's a lot of connections?|
|Previous:||From: Dan Harris||Date: 2005-07-15 03:49:50|
|Subject: Re: slow joining very large table to smaller ones|