Re: why hash on the primary key?

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: why hash on the primary key?
Date: 2008-11-29 16:42:33
Message-ID: 603c8f070811290842r214628a8h5f1503b44a07c037@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>> Well, that just says your cost parameters need a bit of adjustment
>>> if you'd like the planner to get the crossover point exactly right.
>
>> Any sense of which ones might be worth fiddling with?
>
> random_page_cost, effective_cache_size, maybe the cpu_xxx parameters.

I fiddled with this some more on a more complex query with more than
20 joins. It appears that the culprit is from_collapse_limit. If I
raise from_collapse_limit, most of the hash joins turn into Nested
Loops or Nested Loop Left Joins with Index Scans. The estimated cost
is about 20% of the cost of the original plan, but the planning takes
so much longer that the actual time is 60% larger. This is really the
pits.

I had hoped that the work Simon Riggs was doing on join removal would
hit 8.4, but that doesn't seem likely at this point. The problem is
that the application is a web app where the user can select which
columns they want to see. Most of the time only ~6-8 columns of 40+
that are available are selected, and the joins needed to generate the
unselected columns can be tossed (which would hopefully both avoid
unnecessary join steps and also speed up planning the remaining
joins).

...Robert

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2008-11-29 17:34:48 db backup script in gentoo
Previous Message Tom Lane 2008-11-29 15:43:05 Re: why hash on the primary key?