Re: Question: test "aggregates" failed in 32-bit machine

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Question: test "aggregates" failed in 32-bit machine
Date: 2022-10-01 19:13:59
Message-ID: 961620.1664651639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So at this point I've lost all faith in the estimates being meaningful
> at all.

I spent some time today looking into the question of what our qsort
code actually does. I wrote a quick-n-dirty little test module
(attached) to measure the number of comparisons qsort really uses
for assorted sample inputs. The results will move a bit from run
to run because of randomization, but the average counts should be
pretty stable I think. I got results like these:

regression=# create temp table data as
select * from qsort_comparisons(100000);
SELECT 10
regression=# select n * log(groups)/log(2) as est, 100*(n * log(groups)/log(2) - avg_cmps)/avg_cmps as pct_err, * from data;
est | pct_err | n | groups | avg_cmps | min_cmps | max_cmps | note
--------------------+--------------------+--------+--------+----------+----------+----------+------------------------
0 | -100 | 100000 | 1 | 99999 | 99999 | 99999 | all values the same
1660964.0474436812 | -5.419880052975057 | 100000 | 100000 | 1756145 | 1722569 | 1835627 | all values distinct
100000 | -33.33911061041376 | 100000 | 2 | 150013 | 150008 | 150024 | 2 distinct values
400000 | 11.075628618635713 | 100000 | 16 | 360115 | 337586 | 431376 | 16 distinct values
600000 | 8.369757612975473 | 100000 | 64 | 553660 | 523858 | 639492 | 64 distinct values
800000 | 4.770461016221087 | 100000 | 256 | 763574 | 733898 | 844450 | 256 distinct values
1000000 | 1.5540821186618827 | 100000 | 1024 | 984697 | 953830 | 1111384 | 1024 distinct values
1457116.0087927429 | 41.97897366170798 | 100000 | 24342 | 1026290 | 994694 | 1089503 | Zipfian, parameter 1.1
1150828.9986140348 | 158.28880094758154 | 100000 | 2913 | 445559 | 426575 | 511214 | Zipfian, parameter 1.5
578135.9713524659 | 327.6090378488971 | 100000 | 55 | 135202 | 132541 | 213467 | Zipfian, parameter 3.0
(10 rows)

So "N * log(NumberOfGroups)" is a pretty solid estimate for
uniformly-sized groups ... except when NumberOfGroups = 1 ... but it
is a significant overestimate if the groups aren't uniformly sized.
Now a factor of 2X or 3X isn't awful --- we're very happy to accept
estimates only that good in other contexts --- but I still wonder
whether this is reliable enough to justify the calculations being
done in compute_cpu_sort_cost. I'm still very afraid that the
conclusions we're drawing about the sort costs for different column
orders are mostly junk.

In any case, something's got to be done about the failure at
NumberOfGroups = 1. Instead of this:

correctedNGroups = Max(1.0, ceil(correctedNGroups));
per_tuple_cost += totalFuncCost * LOG2(correctedNGroups);

I suggest

if (correctedNGroups > 1.0)
per_tuple_cost += totalFuncCost * LOG2(correctedNGroups);
else /* Sorting N all-alike tuples takes only N-1 comparisons */
per_tuple_cost += totalFuncCost;

(Note that the ceil() here is a complete waste, because all paths leading
to this produced integral estimates already. Even if they didn't, I see
no good argument why ceil() makes the result better.)

I'm still of the opinion that we need to revert this code for now.

regards, tom lane

Attachment Content-Type Size
count_comparisons.c text/x-c 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-10-01 19:26:30 Re: Question: test "aggregates" failed in 32-bit machine
Previous Message Peter Geoghegan 2022-10-01 16:53:59 Re: pgsql: Avoid improbable PANIC during heap_update.