Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adam Scott <adam(dot)c(dot)scott(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query
Date: 2020-01-03 19:05:09
Message-ID: 837E8CEE-FFF3-497C-B7C2-37BB42FAF2C5@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> 10 дек. 2019 г., в 10:38, Michael Paquier <michael(at)paquier(dot)xyz> написал(а):
>
> On Tue, Nov 19, 2019 at 08:40:56PM +0900, Michael Paquier wrote:
>> If you add an ANALYZE on the table natica_hdu_test after restoring, I
>> am rather sure that you would reproduce the crash more quickly because
>> the handling around the stats of the column are busted here. Anyway,
>> taking my example of upthread, I have been also able to reproduce the
>> problem on REL_10_STABLE even with assertions enabled: the trick is
>> that you need to leave once the session after the analyze on the
>> table. Then a SELECT within a new session is enough to crash the
>> server.
>
> So... I have looked more at this one, and from my previous example it
> seems that we have a one-off error when looking up at the array
> holding the histograms for ranges (lower and upper bound).
>
> In my previous example, we get to build 101 RangeBounds when beginning
> to calculate the range operator selectivity in
> calc_hist_selectivity(). However, when we get to the point of
> calc_hist_selectivity_contained(), upper_index gets calculated at 100
> which is just at the limit of the indexed bounds, and the code would
> happily look at the last bound as well as the one-after-the-last bound
> as range_cmp_bounds() sees fit, but the latter just points to the
> void. The code looks wrong since its introduction in 59d0bf9d and
> it seems that the changes done for free_attstatsslot() in 9aab83f make
> the issue more easily reproducible.
>
> A fix like the rough POC attached addresses the issue, but I think
> that's too naive to not count for the first bin in the ranges
> evaluated. Tomas, you may be more familiar with this area of the code
> than I am. What do you think?

So... I was looking for some patches on CF and found this one. It's one-liner, what could go wrong?

1. New tests do no fail on my machine on added test case without that line. (on master branch; from this thread i concluded that they should fail)
2. I believe line should be not like
+ for (i = upper_index - 1; i >= 0; i--)
but rather
+ for (i = min(upper_index, hist_nvalues - 2); i >= 0; i--)

I will dig into this during this CF. Currently, that's my 2 cents.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Johan Fredrik Øhman 2020-01-03 20:06:16 Re: BUG #16182: Error in logs from "renaming temporary statistics"
Previous Message Christian Quest 2020-01-03 19:02:18 Re: BUG #16183: PREPARED STATEMENT slowed down by jit