Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adam Scott <adam(dot)c(dot)scott(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
Date: 2020-01-06 20:28:34
Message-ID: 5794.1578342514@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andrey Borodin <x4mmm(at)yandex-team(dot)ru> writes:
> PFA possible fix for this.

I looked into this a bit too. I don't have a lot of faith in Michael's
example; it took me four tries to find a machine it would fail on.
Still, I did eventually get a crash, and I concur that the problem is
that calc_hist_selectivity_contained tries to access the entry after
the histogram end.

I believe that Michael's draft fix is actually about right, because
we don't want to treat the key space above the histogram upper bound
as being an actual bin: we should assume that there are no values there.
So just starting the loop at the last real bin is sufficient. However,
I'd prefer to handle the two edge cases (i.e., probe value below histogram
lower bound or above histogram upper bound) explicitly, because that makes
it more apparent what's going on.

There are a couple of other ways in which this code seems insufficiently
paranoid to me:

* It seems worth spending a couple lines of code at the beginning to
verify that the histogram has at least one bin, as it already did for
the range-length histogram. Otherwise we've got various divide-by-zero
hazards here, along with a need for extra tests in the looping functions.

* I think we'd better guard against NaNs coming back from the range
subdiff function, as well as the possibility of getting a NaN from
the division in get_position (compare [1]).

* I am not quite totally sure about this part, but it seems to me
that calc_hist_selectivity_contains probably ought to handle the
range-bound cases the same as calc_hist_selectivity_contained,
even though only the latter is trying to access an unsafe array
index.

That all leads me to the attached draft patch. It lacks a test
case --- I wonder if we can find one that crashes more portably
than Michael's? And more eyeballs on calc_hist_selectivity_contains
would be good too.

regards, tom lane

[1] https://www.postgresql.org/message-id/12957.1577974305@sss.pgh.pa.us

Attachment Content-Type Size
bug-16122-fix-wip.patch text/x-diff 4.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-01-07 00:22:16 Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
Previous Message Jeff Janes 2020-01-06 17:50:48 Re: BUG #16183: PREPARED STATEMENT slowed down by jit