Re: Question: test "aggregates" failed in 32-bit machine

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Question: test "aggregates" failed in 32-bit machine
Date: 2022-09-30 16:57:13
Message-ID: 656578.1664557033@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The most likely theory, I think, is that that compiler is generating
> slightly different floating-point code causing different plans to
> be costed slightly differently than what the test case is expecting.
> Probably, the different orderings of the keys in this test case have
> exactly the same cost, or almost exactly, so that different roundoff
> error could be enough to change the selected plan.

I added some debug printouts to get_cheapest_group_keys_order()
and verified that in the two problematic queries, there are two
different orderings that have (on my machine) exactly equal lowest
cost. So the code picks the first of those and ignores the second.
Different roundoff error would be enough to make it do something
else.

I find this problematic because "exactly equal" costs are not going
to be unusual. That's because the values that cost_sort_estimate
relies on are, sadly, just about completely fictional. It's expecting
that it can get a good cost estimate based on:

* procost. In case you hadn't noticed, this is going to be 1 for
just about every function we might be considering here.

* column width. This is either going to be a constant (e.g. 4
for integers) or, again, largely fictional. The logic for
converting widths to cost multipliers adds yet another layer
of debatability.

* numdistinct estimates. Sometimes we know what we're talking
about there, but often we don't.

So what I'm afraid we are dealing with here is usually going to
be garbage in, garbage out. And we're expending an awful lot
of code and cycles to arrive at these highly questionable choices.

Given the previous complaints about db0d67db2, I wonder if it's not
most prudent to revert it. I doubt we are going to get satisfactory
behavior out of it until there's fairly substantial improvements in
all these underlying estimates.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-09-30 17:01:14 Re: [PATCH v1] [meson] add a default option prefix=/usr/local/pgsql
Previous Message Andres Freund 2022-09-30 16:35:50 Re: Question: test "aggregates" failed in 32-bit machine