I've been doing some benchmarking and profiling on the PostgreSQL
query analyzer, and it seems that (at least for the sorts of queries
that I typically run) the dominant cost is add_path(). I've been able
to find two optimizations that seem to help significantly:
1. add_path() often calls compare_fuzzy_path_costs() twice on the same
pair of paths, and when the paths compare equal on one criterion, some
comparisons are duplicated. I've refactored this function to return
the results of both calculations without repeating any floating-point
2. match_unsorted_outer() adds as many as 5 nested loop joins at a
time with the same set of pathkeys. In my tests, it tended to be ~3 -
cheapest inner, cheapest inner materialized, and cheapest inner index.
Since these all have the same pathkeys, clearly only the one with the
cheapest total cost is in the running for cheapest total cost for that
set of pathkeys, and likewise for startup cost (and the two may be the
same). Yet we compare all of them against the whole pathlist, one
after the other, including (for the most part) the rather expensive
pathkey comparison. I've added a function add_similar_paths() and
refactored match_unsorted_outer() to use it.
On a couple of complex (and proprietary) queries with 12+ joins each,
I measure a planning time improvement of 8-12% with the attached patch
applied. It would be interesting to try to replicate this on a
publicly available data set, but I don't know of a good one to use.
Suggestions welcome - results of performance testing on your own
favorite big queries even more welcome. Simple test harness also
attached. I took the approach of dropping caches, starting the
server, and then running this 5 times each on several queries,
dropping top and bottom results.
pgsql-hackers by date
|Next:||From: Grzegorz Jaskiewicz||Date: 2009-02-01 04:54:08|
|Subject: Re: Updated backslash consistency patch |
|Previous:||From: Hiroshi Saito||Date: 2009-02-01 03:47:40|
|Subject: Re: pgevent warnings on mingw|