Re: add_path optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: add_path optimization
Date: 2009-02-28 02:35:34
Message-ID: 1268.1235788534@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ This patch is on the 2009-First commitfest list, but there was earlier
discussion to the effect that it might be a good idea to include in 8.4,
so I made some time to look at it. ]

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I've been doing some benchmarking and profiling on the PostgreSQL
> query analyzer, and it seems that (at least for the sorts of queries
> that I typically run) the dominant cost is add_path(). I've been able
> to find two optimizations that seem to help significantly:

I got around to testing this patch a bit today, and I'm afraid it
doesn't look very good from here. The test case I used was a psql
script with 10000 repetitions of

explain select * from information_schema.table_constraints
join information_schema.referential_constraints
using (constraint_catalog,constraint_schema,constraint_name);

which isn't terribly creative, but those two views incorporate multiple
joins so I figured this was a reasonably middle-of-the-road planner
exercise.

I first tried just the compare_fuzzy_path_costs() change and really
couldn't measure any reliable difference. oprofile output for CVS HEAD
looks like

PU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
149294 9.7519 postgres AllocSetAlloc
85437 5.5808 postgres SearchCatCache
45520 2.9734 postgres copyObject
39525 2.5818 postgres add_path
35878 2.3436 postgres expression_tree_walker
29733 1.9422 libc-2.8.so memcpy
25560 1.6696 postgres MemoryContextAllocZeroAligned
20879 1.3638 libc-2.8.so strlen
20521 1.3404 postgres add_paths_to_joinrel
20054 1.3099 postgres AllocSetFree
20023 1.3079 postgres cost_mergejoin
19363 1.2648 postgres nocachegetattr
17689 1.1554 libc-2.8.so vfprintf
17335 1.1323 postgres generate_join_implied_equalities
16085 1.0507 postgres MemoryContextAlloc

and with the compare_fuzzy_path_costs() change

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
144882 9.6951 postgres AllocSetAlloc
84205 5.6348 postgres SearchCatCache
43956 2.9414 postgres copyObject
37418 2.5039 postgres add_path
34865 2.3331 postgres expression_tree_walker
28718 1.9217 libc-2.8.so memcpy
23643 1.5821 postgres MemoryContextAllocZeroAligned
20835 1.3942 libc-2.8.so strlen
20065 1.3427 postgres cost_mergejoin
19672 1.3164 postgres AllocSetFree
18992 1.2709 postgres add_paths_to_joinrel
18802 1.2582 postgres nocachegetattr
17109 1.1449 libc-2.8.so __printf_fp
16901 1.1310 libc-2.8.so vfprintf
16159 1.0813 postgres hash_search_with_hash_value
16039 1.0733 postgres generate_join_implied_equalities
15554 1.0408 postgres MemoryContextAlloc
14981 1.0025 postgres expression_tree_mutator

Note that the compiler is inlining compare_fuzzy_path_costs in both
cases.

I then added the add_similar_paths change, and got this:

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
142645 9.6668 postgres AllocSetAlloc
83161 5.6357 postgres SearchCatCache
44660 3.0265 postgres copyObject
35762 2.4235 postgres expression_tree_walker
28427 1.9264 postgres add_path
27970 1.8955 libc-2.8.so memcpy
23317 1.5801 postgres MemoryContextAllocZeroAligned
20685 1.4018 libc-2.8.so strlen
19646 1.3314 postgres add_paths_to_joinrel
19129 1.2963 postgres AllocSetFree
18065 1.2242 postgres cost_mergejoin
17768 1.2041 postgres nocachegetattr
17249 1.1689 postgres generate_join_implied_equalities
16784 1.1374 libc-2.8.so vfprintf
15740 1.0667 libc-2.8.so __printf_fp
15642 1.0600 postgres MemoryContextAlloc
15194 1.0297 postgres expression_tree_mutator
15009 1.0171 postgres hash_search_with_hash_value
...
7746 0.5249 postgres add_similar_paths

where at least you can see a change in add_path's percentage, although
overall runtime did not change measurably; and in any case whatever was
saved seems to have gone into add_similar_paths.

It gets worse though: add_similar_paths is flat *wrong*. It compares
each successive path to only the current cheapest-total candidate,
and thus is quite likely to seize on the wrong cheapest-startup path.
I tried fixing its loop logic like so:

for (i = 1; i < npath; ++i)
{
if (compare_fuzzy_path_costs(cheapest_total, paths[i], TOTAL_COST) > 0)
cheapest_total = paths[i];
if (compare_fuzzy_path_costs(cheapest_startup, paths[i], STARTUP_COST) > 0)
cheapest_startup = paths[i];
}

(obviously, this is with compare_fuzzy_path_costs reverted to its form
in HEAD) and got this:

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples % image name symbol name
143675 9.6105 postgres AllocSetAlloc
83809 5.6060 postgres SearchCatCache
43837 2.9323 postgres copyObject
35790 2.3940 postgres expression_tree_walker
29870 1.9980 postgres compare_fuzzy_path_costs
28521 1.9078 libc-2.8.so memcpy
23448 1.5685 postgres MemoryContextAllocZeroAligned
21049 1.4080 libc-2.8.so strlen
18343 1.2270 postgres add_path
18154 1.2143 postgres AllocSetFree
18119 1.2120 postgres cost_mergejoin
18036 1.2064 postgres add_paths_to_joinrel
17791 1.1901 postgres nocachegetattr
17277 1.1557 postgres generate_join_implied_equalities
16956 1.1342 libc-2.8.so vfprintf
16125 1.0786 postgres hash_search_with_hash_value
15733 1.0524 libc-2.8.so __printf_fp
15387 1.0292 postgres MemoryContextAlloc
...
5759 0.3852 postgres add_similar_paths

Here we have a total of 3.6% spent in code that had used ~2.5% before
we started. Not very promising.

Now, maybe if we could force the compiler to inline
compare_fuzzy_path_costs we could buy some of that back. Another
possibility is to contort add_similar_path some more so that it avoids
double comparisons so long as cheapest_total and cheapest_startup are
the same path; I have a feeling that's not going to win much though.

So it's kind of looking like a dead end from here. But in any case the
patch isn't acceptable as-is with that fundamental logic error in
add_similar_paths, so I'm bouncing it back for rework.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-02-28 03:38:48 Re: add_path optimization
Previous Message James Pye 2009-02-28 02:19:51 Re: xpath processing brain dead