From: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: benchmarking the query planner |
Date: | 2008-12-12 03:04:51 |
Message-ID: | 603c8f070812111904j4b839c17sabde937e3d4cf08a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> OK, I'll bite. How do we decide where to put the cutoff? If we make
>> it too high, it will have a negative effect on join selectivity
>> estimates; if it's too low, it won't really address the problem we're
>> trying to fix. I randomly propose p = 0.001, which should limit
>> eqjoinsel() to about a million equality tests in the worst case. In
>> the synthetic example we were just benchmarking, that causes the
>> entire MCV array to be tossed out the window (which feels about
>> right).
>
> Yeah. One idle thought I had was that maybe the cutoff needs to
> consider both probabilities: if the high-frequency MCVs on one side
> chance to match to not-so-high-frequency MCVs on the other, you
> would like to know about that. As long as we keep the lists in
> frequency order, it seems easy to implement this: for each MCV
> examined by the outer loop, you run the inner loop until the product of
> the outer and current inner frequency drops below whatever your
> threshold is. This doesn't immediately suggest what the threshold
I had this idle thought too, but I didn't write it down because...
> ought to be, but it seems like it ought to be possible to determine
> that given a desired maximum error in the overall estimate. I'm also
> not very clear on what the "total frequency" computations (matchfreq2
> and unmatchfreq2 in the current code) ought to look like if we are using
> a variable subset of the inner list.
...of this exact concern, which I think is an insurmountable problem.
If you don't consider some of the MCVs AT ALL, I think you can add
their frequencies back in to otherfreq{1,2} and go home, but if you
consider them for some elements of the other list but not all, I'm not
sure there's an appropriate way to proceed.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2008-12-12 03:11:43 | Re: Updates of SE-PostgreSQL 8.4devel patches (r1268) |
Previous Message | Nathan Boley | 2008-12-12 02:39:56 | Re: benchmarking the query planner |