Quick Links

Re: slow joining very large table to smaller ones

From:	John A Meinel <john(at)arbash-meinel(dot)com>
To:	Dan Harris <fbsd(at)drivefaster(dot)net>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: slow joining very large table to smaller ones
Date:	2005-07-15 04:12:15
Message-ID:	42D7379F.3040806@arbash-meinel.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Dan Harris wrote:
>
> On Jul 14, 2005, at 7:15 PM, John A Meinel wrote:
>
>>
>>
>> Is the distribution of your rows uneven? Meaning do you have more rows
>> with a later id than an earlier one?
>>
>
> There are definitely some id's that will have many times more than the
> others. If I group and count them, the top 10 are fairly dominant in
> the table.

That usually skews the estimates. Since the estimate is more of an
average (unless the statistics are higher).

>
>>>
>>
>> Hmm.. How to do it permanantly? Well you could always issue "set
>> join_collapse set 1; select * from ...."
>> But obviously that isn't what you prefer. :)
>>
>> I think there are things you can do to make merge join more expensive
>> than a nested loop, but I'm not sure what they are.
>
>
> Maybe someone else has some ideas to encourage this behavior for future
> work? Setting it on a per-connection basis is doable, but would add
> some burden to us in code.

My biggest question is why the planner things the Nested Loop would be
so expensive.
Have you tuned any of the parameters? It seems like something is out of
whack. (cpu_tuple_cost, random_page_cost, etc...)

>
>>
>> What I really don't understand is that the estimates dropped as well.
>> The actual number of estimate rows drops to 3k instead of > 1M.
>> The real question is why does the planner think it will be so expensive?
>>
>>
>>> select count(*) from k_b join k_r using (incidentid) where k_b.id=107
>>> and k_r.id=94;
>>> count
>>> -------
>>> 373
>>>
>>>
>>
>> Well, this says that they are indeed much more selective.
>> Each one has > 1k rows, but together you end up with only 400.
>>
>
> Is this a bad thing? Is this not "selective enough" to make it much
> faster?

Yes, being more selective is what makes it faster. But the planner
doesn't seem to notice it properly.

>
> Overall, I'm much happier now after seeing the new plan come about, if
> I can find a way to make that join_collapse behavior permanent, I can
> certainly live with these numbers.
>

I'm sure there are pieces to tune, but I've reached my limits of
parameters to tweak :)

> Thanks again for your continued efforts.
>
> -Dan
>

John
=:->

In response to

Re: slow joining very large table to smaller ones at 2005-07-15 03:49:50 from Dan Harris

Responses

Re: slow joining very large table to smaller ones at 2005-07-15 15:09:37 from Dan Harris

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Karim Nassar	2005-07-15 07:00:58	What's a lot of connections?
Previous Message	Dan Harris	2005-07-15 03:49:50	Re: slow joining very large table to smaller ones