Re: Question: test "aggregates" failed in 32-bit machine

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Question: test "aggregates" failed in 32-bit machine
Date: 2022-09-30 19:40:02
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Given the previous complaints about db0d67db2, I wonder if it's not
> most prudent to revert it. I doubt we are going to get satisfactory
> behavior out of it until there's fairly substantial improvements in
> all these underlying estimates.

After spending some more time looking at the code, I think that that
is something we absolutely have to discuss. I already complained at
[1] about how db0d67db2 made very significant changes in sort cost
estimation behavior, which seem likely to result in significant
user-visible plan changes that might or might not be for the better.
But I hadn't read any of the code at that point. Now I have, and
frankly it's not ready for prime time. Beyond the question of
whether we have sufficiently accurate input values, I see these
issues in and around compute_cpu_sort_cost():

1. The algorithm is said to be based on Sedgewick & Bentley 2002 [2].
I have the highest regard for those two gentlemen, so I'm quite
prepared to believe that their estimate of the number of comparisons
used by Quicksort is good. However, the expression given in our

* log(N! / (X1! * X2! * ..)) ~ sum(Xi * log(N/Xi))

doesn't look much like anything they wrote. More, what we're actually
doing is

* We assume all Xi the same because now we don't have any estimation of
* group sizes, we have only know the estimate of number of groups (distinct
* values). In that case, formula becomes:
* N * log(NumberOfGroups)

That's a pretty drastic simplification. No argument is given as to why
that's still reliable enough to be useful for the purposes to which this
code tries to put it --- especially when you consider that real-world
data is more likely to follow Zipf's law than have uniform group sizes.
If you're going to go as far as doing this:

* For multi-column sorts we need to estimate the number of comparisons for
* each individual column - for example with columns (c1, c2, ..., ck) we
* can estimate that number of comparisons on ck is roughly
* ncomparisons(c1, c2, ..., ck) / ncomparisons(c1, c2, ..., c(k-1))

you'd better pray that your number-of-comparisons estimates are pretty
darn good, or what you're going to get out is going to be mostly

2. Sedgewick & Bentley analyzed a specific version of Quicksort,
which is ... um ... not the version we are using. It doesn't look
to me like the choice of partitioning element is the same. Maybe
that doesn't matter much in the end, but there's sure no discussion
of the point in this patch.

So at this point I've lost all faith in the estimates being meaningful
at all. And that's assuming that the simplified algorithm is
implemented accurately, which it is not:

3. totalFuncCost is started off at 1.0. Surely that should be zero?
If not, at least a comment to justify it would be nice.

4. The code around the add_function_cost call evidently wants to carry
the procost lookup result from one column to the next, because it
skips the lookup when prev_datatype == em->em_datatype. However, the
value of funcCost isn't carried across columns, because it's local to
the loop. The effect of this is that anyplace where adjacent GROUP BY
columns are of the same datatype, we'll use the fixed 1.0 value of
funcCost instead of looking up the real procost. Admittedly, since
the real procost is probably also 1.0, this might not mean much in
practice. Nonetheless it's broken code. (Oh, btw: I doubt that
using add_function_cost rather than raw procost is of any value
whatsoever if you're just going to pass it a NULL node tree.)

5. I'm pretty dubious about the idea that we can use the rather-random
first element of the EquivalenceClass to determine the datatype that
will be compared, much less the average widths of the columns. It's
entirely possible for an EC to contain both int4 and int8 vars, or
text vars of substantially different average widths. I think we
really need to be going back to the original GroupClauses and looking
at the variables named there.

6. Worse than that, we're also using the first element of the
EquivalenceClass to calculate the number of groups of this sort key.
This is FLAT OUT WRONG, as certainly different EC members can have
very different stats.

7. The code considers that presorted-key columns do not add to the
comparison costs, yet the comment about it claims the opposite:

* Presorted keys are not considered in the cost above, but we still
* do have to compare them in the qsort comparator. So make sure to
* factor in the cost in that case.
if (i >= nPresortedKeys)

I'm not entirely sure whether the code is broken or the comment is,
but at least one of them is. I'm also pretty confused about why
we still add such columns' comparison functions to the running
totalFuncCost if we think they're not sorted on.

8. In the case complained of to start this thread, we're unable
to perceive any sort-cost difference between "p, d, c, v" and
"p, c, d, v", which is a little surprising because that test case
sets up c with twice as many distinct values as d. Other things
being equal (which they are, because both columns are int4), surely
the latter key ordering should be favored in hopes of reducing the
number of times we have to compare the third column. But it's not.
I think that this can probably be blamed on the early-exit condition
at the bottom of the loop:

* Once we get single-row group, it means tuples in the group are
* unique and we can skip all remaining columns.
if (tuplesPerPrevGroup <= 1.0)

Ordering on p already gets us down to 2 tuples per group, so pretty
much any of the other columns as second grouping column will compute
a next group size of 1, and then we don't consider columns beyond that.

9. The is_fake_var() hackery is pretty sad. We should have found a
better solution than that. Maybe estimate_num_groups() needs more

10. As I already mentioned, get_width_cost_multiplier() doesn't appear
to have any foundation in reality; or if it does, the comments sure
provide no justification for these particular equations rather than
some other ones. The shakiness of the logic can be inferred
immediately from the fact that the header comment is fundamentally
confused about what it's doing:
* Return value is in cpu_operator_cost units.
No it isn't, it's a pure ratio.

In short, I think the claim that this code provides better sort cost
estimates than we had before is quite unjustified. Maybe it could
get there eventually, but I do not want to ship v15 with this.
I think we ought to revert all the changes around cost_sort.

Perhaps we could salvage the GROUP BY changes by just ordering the
columns by decreasing number of groups, which is the only component of
the current cost estimation that I think has any detectable connection
to reality. But I suspect the RMT will favor just reverting the
whole thing for v15.

regards, tom lane

[2] The URL given in the code doesn't work anymore, but this does:

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-30 20:15:24 Re: predefined role(s) for VACUUM and ANALYZE
Previous Message Nathan Bossart 2022-09-30 19:23:50 Re: predefined role(s) for VACUUM and ANALYZE