Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adam Scott <adam(dot)c(dot)scott(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
Date: 2020-01-07 15:08:07
Message-ID: 23459.1578409687@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> On Mon, Jan 06, 2020 at 03:28:34PM -0500, Tom Lane wrote:
>> * I am not quite totally sure about this part, but it seems to me
>> that calc_hist_selectivity_contains probably ought to handle the
>> range-bound cases the same as calc_hist_selectivity_contained,
>> even though only the latter is trying to access an unsafe array
>> index.

> Both could actually be merged, no? The assumptions behind the
> upper/lower bound lookups are actually just mirrors of each other
> based on which side of the operators <@ or @> the code looks at.

Hmm ... maybe, but I'm not entirely convinced that they should be
mirror images. The endpoint handling is the same right now, but
should it be? Anyway, merging them seems out of scope for a bug fix.

>> That all leads me to the attached draft patch. It lacks a test
>> case --- I wonder if we can find one that crashes more portably
>> than Michael's? And more eyeballs on calc_hist_selectivity_contains
>> would be good too.

> It would be nice to keep the test case as it crashes immediately on
> own Debian laptop.

I wouldn't mind having a test case if it crashes reliably, but it
doesn't for me. I poked into why not, and realized that even if
get_position is passed a pointer to garbage data in hist2, it will
not crash unless the "!hist2->infinite" test succeeds, because
otherwise it won't call the rng_subdiff function. Depending
on compiler and optimization level, that usually means that any
nonzero value in that byte will prevent a crash. Given that the
whole point here is we're accessing memory that hasn't been
initialized by this code, we don't have any way to make it likely
that that won't be true.

(Of course, we could have calc_hist_selectivity() allocate one
extra array slot and zero it, but that's getting a bit silly.)

BTW, I noticed that at -O0, my compiler generates a code sequence
that effectively implements that test as "hist2->infinite != true",
ie it will succeed for any bit pattern that's not exactly 0x01.
This greatly raises the odds of seeing the crash of course (so maybe
your compiler is doing likewise?). I find that rather disturbing
because my mental model has always been that any nonzero bit pattern
in a bool would be interpreted as true. Apparently the gcc boys think
that they're entitled to consider _Bool values other than exactly 0
and 1 as undefined, which is scary because I'm not sure we are totally
consistent about that everywhere. It definitely makes it harder to
reason about what will happen with garbage memory contents. (This
might also explain why you saw a change in behavior at 9a95a77d9d.)

So the bottom line here is that I don't think that we can build a
test case that will crash with any large probability, especially
not if it's just one part of a test script --- the longer the session
has been running, the greater the odds are that whatever is just past
the allocated histogram array contains nonzero garbage. I'm not sure
it's worth spending test cycles forevermore on a test that most likely
will fail to reveal a bug even if it's there.

> - if (bin_width <= 0.0)
> + if (isnan(bin_width) || bin_width <= 0.0)
> return 0.5; /* zero width bin */
> This comment is incorrect now?

Yeah, it could be improved. Maybe "punt for NaN or zero-width bin"?

> + if (isnan(position))
> + return 0.5; /* punt if we have any NaNs or Infs */
> +
> This comment is incorrect as well, no? In this case both histograms
> have finite bounds, and you just check if that's a number.

The point is that we could get a NaN from the division either from
the rng_subdiff function returning NaN, or from an undefined case
such as Inf/Inf. Maybe write "punt for NaN input, Inf/Inf, etc"?
(We don't need to test for Inf explicitly, because the tests just
below can deal with that.)

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Heikki Linnakangas 2020-01-07 15:33:23 Re: REINDEX CONCURRENTLY unexpectedly fails
Previous Message Oleksandr Shulgin 2020-01-07 14:23:41 Re: libpq parameter parsing problem