Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adam Scott <adam(dot)c(dot)scott(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
Date: 2020-01-07 06:39:07
Message-ID: 20200107063907.GF2386@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jan 06, 2020 at 03:28:34PM -0500, Tom Lane wrote:
> I believe that Michael's draft fix is actually about right, because
> we don't want to treat the key space above the histogram upper bound
> as being an actual bin: we should assume that there are no values there.
> So just starting the loop at the last real bin is sufficient. However,
> I'd prefer to handle the two edge cases (i.e., probe value below histogram
> lower bound or above histogram upper bound) explicitly, because that makes
> it more apparent what's going on.

Actually, I suspect that the original error comes from a copy-pasto
from calc_hist_selectivity_contains to calc_hist_selectivity_contained
when the code has been originally written.

> There are a couple of other ways in which this code seems insufficiently
> paranoid to me:
>
> * It seems worth spending a couple lines of code at the beginning to
> verify that the histogram has at least one bin, as it already did for
> the range-length histogram. Otherwise we've got various divide-by-zero
> hazards here, along with a need for extra tests in the looping functions.

Fine by me.

> * I think we'd better guard against NaNs coming back from the range
> subdiff function, as well as the possibility of getting a NaN from
> the division in get_position (compare [1]).

Good idea.

> * I am not quite totally sure about this part, but it seems to me
> that calc_hist_selectivity_contains probably ought to handle the
> range-bound cases the same as calc_hist_selectivity_contained,
> even though only the latter is trying to access an unsafe array
> index.

Both could actually be merged, no? The assumptions behind the
upper/lower bound lookups are actually just mirrors of each other
based on which side of the operators <@ or @> the code looks at.
Perhaps the current code is cleaner as done currently anyway..

> That all leads me to the attached draft patch. It lacks a test
> case --- I wonder if we can find one that crashes more portably
> than Michael's? And more eyeballs on calc_hist_selectivity_contains
> would be good too.

It would be nice to keep the test case as it crashes immediately on
own Debian laptop. Anyway, if we add a test, we really need
something that pokes at the bin with the histogram with the upper
bound. Just something like the attached on HEAD proves that we have
zero coverage for it:
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -1050,6 +1050,7 @@ calc_hist_selectivity_contained(TypeCacheEntry *typcache,
double dist;
double length_hist_frac;
bool final_bin = false;
+ Assert(i == upper_index);
So that's not good from the start..

- if (bin_width <= 0.0)
+ if (isnan(bin_width) || bin_width <= 0.0)
return 0.5; /* zero width bin */
This comment is incorrect now?

+ if (isnan(position))
+ return 0.5; /* punt if we have any NaNs or Infs */
+
This comment is incorrect as well, no? In this case both histograms
have finite bounds, and you just check if that's a number.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-01-07 06:44:12 Re: BUG #16190: The usage of NULL pointer in refint.c
Previous Message Andres Freund 2020-01-07 02:26:54 Re: BUG #16190: The usage of NULL pointer in refint.c