Skip site navigation (1) Skip section navigation (2)

Re: slow joining very large table to smaller ones

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Dan Harris <fbsd(at)drivefaster(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: slow joining very large table to smaller ones
Date: 2005-07-15 04:12:15
Message-ID: 42D7379F.3040806@arbash-meinel.com (view raw or flat)
Thread:
Lists: pgsql-performance
Dan Harris wrote:
>
> On Jul 14, 2005, at 7:15 PM, John A Meinel wrote:
>
>>
>>
>> Is the distribution of your rows uneven? Meaning do you have more rows
>> with a later id than an earlier one?
>>
>
> There are definitely some id's that will have many times more than  the
> others.  If I group and count them, the top 10 are fairly  dominant in
> the table.

That usually skews the estimates. Since the estimate is more of an
average (unless the statistics are higher).

>
>>>
>>
>> Hmm.. How to do it permanantly? Well you could always issue "set
>> join_collapse set 1; select * from ...."
>> But obviously that isn't what you prefer. :)
>>
>> I think there are things you can do to make merge join more expensive
>> than a nested loop, but I'm not sure what they are.
>
>
> Maybe someone else has some ideas to encourage this behavior for  future
> work?  Setting it on a per-connection basis is doable, but  would add
> some burden to us in code.

My biggest question is why the planner things the Nested Loop would be
so expensive.
Have you tuned any of the parameters? It seems like something is out of
whack. (cpu_tuple_cost, random_page_cost, etc...)

>
>>
>> What I really don't understand is that the estimates dropped as well.
>> The actual number of estimate rows drops to 3k instead of > 1M.
>> The real question is why does the planner think it will be so  expensive?
>>
>>
>>> select count(*) from k_b join k_r using (incidentid) where k_b.id=107
>>> and k_r.id=94;
>>> count
>>> -------
>>>    373
>>>
>>>
>>
>> Well, this says that they are indeed much more selective.
>> Each one has > 1k rows, but together you end up with only 400.
>>
>
> Is this a bad thing?  Is this not "selective enough" to make it much
> faster?

Yes, being more selective is what makes it faster. But the planner
doesn't seem to notice it properly.

>
> Overall, I'm much happier now after seeing the new plan come about,  if
> I can find a way to make that join_collapse behavior permanent, I  can
> certainly live with these numbers.
>

I'm sure there are pieces to tune, but I've reached my limits of
parameters to tweak :)

> Thanks again for your continued efforts.
>
> -Dan
>

John
=:->

In response to

Responses

pgsql-performance by date

Next:From: Karim NassarDate: 2005-07-15 07:00:58
Subject: What's a lot of connections?
Previous:From: Dan HarrisDate: 2005-07-15 03:49:50
Subject: Re: slow joining very large table to smaller ones

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group