*_collapse_limit, geqo_threshold

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: *_collapse_limit, geqo_threshold
Date: 2009-07-07 05:30:25
Message-ID: 603c8f070907062230v169541b0ka5a939de1132fd5c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think we should try to do something about join_collapse_limit,
from_collapse_limit, and geqo_threshold for 8.5.

http://archives.postgresql.org/message-id/9134.1243289706@sss.pgh.pa.us
http://archives.postgresql.org/message-id/603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com

I'm still of the opinion that join_collapse_threshold is a loaded
foot-gun, because I don't think that users will expect that a join
specified this way:

SELECT ... FROM a JOIN b ON Pab JOIN c ON Pac JOIN d ON Pad ...

will behave differently than one specified this way:

SELECT ... FROM a, b, c, d WHERE Pab AND Pac AND Pad ...

The whole purpose of join_collapse_limit in the first instance is to
prevent planning time from getting out of control, but I don't see how
we can view it as a very effective safety valve when it depends so
heavily on which syntax is used. If the planning time for an N-way
join is excessive, then we're going to have a problem with excessive
planning time whenever the second syntax is selected, and I don't see
any reason to believe that users see the second syntax as "dangerous"
in terms of planning time but the first syntax as "safer".

One possibility would be to remove join_collapse_limit entirely, but
that would eliminate one possibily-useful piece of functionality that
it current enables: namely, the ability to exactly specify the join
order by setting join_collapse_limit to 1. So one possibility would
be to rename the variable something like explicit_join_order and make
it a Boolean; another possibility would be to change the default value
to INT_MAX.

The approach I've taken in the attached patch is to make 0 mean
"unlimited" and make that the default value. I don't have a strong
feeling about whether that's better than the other two options,
although it seems cleaner to me or I'd not have written the patch that
way. We could also consider adopting this same approach for
from_collapse_limit, though for some reason that behavior marginally
less pathological to me.

At any rate, regardless of whether this patch (or one of the other
approaches mentioned above) are adopted for 8.5, I think we should
raise the default values for whatever is left. The defaults basically
haven't been modified since they were put in, and my experience is
that even queries with 10 to 15 joins perform acceptably for OLTP
workloads, which are exactly the workloads where query planning time
is most likely to be an issue. So I would propose raising each of the
limits by 4 (to 12 for from_collapse_limit and join_collapse_limit if
we don't unlimit them entirely, and to 16 for geqo_threshold). I'm
interested in hearing from anyone who has practical experience with
tuning these variables, or any ideas on what we should test to get a
better idea as to how to set them.

Thanks,

...Robert

Attachment Content-Type Size
unlimit_join_collapse.patch text/x-diff 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-07-07 07:28:33 Re: Maintenance Policy?
Previous Message Jaime Casanova 2009-07-07 04:48:31 Re: Patch for automating partitions in PostgreSQL 8.4 Beta 2