Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adam Scott <adam(dot)c(dot)scott(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
Date: 2020-01-09 05:45:45
Message-ID: 20200109054545.GL2251@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jan 07, 2020 at 10:08:07AM -0500, Tom Lane wrote:
> Hmm ... maybe, but I'm not entirely convinced that they should be
> mirror images. The endpoint handling is the same right now, but
> should it be? Anyway, merging them seems out of scope for a bug fix.

Sure. That was just a random though while reading again the code.

> I wouldn't mind having a test case if it crashes reliably, but it
> doesn't for me. I poked into why not, and realized that even if
> get_position is passed a pointer to garbage data in hist2, it will
> not crash unless the "!hist2->infinite" test succeeds, because
> otherwise it won't call the rng_subdiff function. Depending
> on compiler and optimization level, that usually means that any
> nonzero value in that byte will prevent a crash. Given that the
> whole point here is we're accessing memory that hasn't been
> initialized by this code, we don't have any way to make it likely
> that that won't be true.

Ah, OK. That makes sense. If you don't wish to keep the test around
in core, that's fine to me. I may keep that into one of my own repos
just for history's sake.

> BTW, I noticed that at -O0, my compiler generates a code sequence
> that effectively implements that test as "hist2->infinite != true",
> ie it will succeed for any bit pattern that's not exactly 0x01.
> This greatly raises the odds of seeing the crash of course (so maybe
> your compiler is doing likewise?). I find that rather disturbing
> because my mental model has always been that any nonzero bit pattern
> in a bool would be interpreted as true. Apparently the gcc boys think
> that they're entitled to consider _Bool values other than exactly 0
> and 1 as undefined, which is scary because I'm not sure we are totally
> consistent about that everywhere. It definitely makes it harder to
> reason about what will happen with garbage memory contents. (This
> might also explain why you saw a change in behavior at 9a95a77d9d.)

That would not be the first only time my environment behaves different
than the others. My build scripts have been pretty good at catching
issues with compiled code, particularly when it comes to ordering.

>> - if (bin_width <= 0.0)
>> + if (isnan(bin_width) || bin_width <= 0.0)
>> return 0.5; /* zero width bin */
>> This comment is incorrect now?
>
> Yeah, it could be improved. Maybe "punt for NaN or zero-width bin"?

Sounds fine to me.

>> + if (isnan(position))
>> + return 0.5; /* punt if we have any NaNs or Infs */
>> +
>> This comment is incorrect as well, no? In this case both histograms
>> have finite bounds, and you just check if that's a number.
>
> The point is that we could get a NaN from the division either from
> the rng_subdiff function returning NaN, or from an undefined case
> such as Inf/Inf. Maybe write "punt for NaN input, Inf/Inf, etc"?
> (We don't need to test for Inf explicitly, because the tests just
> below can deal with that.)

I would rewrite this comment for clarity. But I am fine to leave that
to you at the end if you feel differently.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-01-10 01:43:34 Re: Error while trying to open pgadmin
Previous Message Michael Paquier 2020-01-09 04:38:09 Re: libpq parameter parsing problem