Re: pg_stat_activity crashes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_activity crashes
Date: 2016-04-21 18:05:28
Message-ID: CA+TgmobYX6+d695SE3-UZ1HL326at-9sr4JzQxBW3GAW4exjEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2016 at 2:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 20, 2016 at 10:56 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> At Wed, 20 Apr 2016 15:14:16 +0200, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote in <571780A8(dot)4070902(at)2ndquadrant(dot)com>
>>> I noticed sporadic segfaults when selecting from pg_stat_activity on
>>> current HEAD.
>>>
>>> The culprit is the 53be0b1add7064ca5db3cd884302dfc3268d884e commit
>>> which added more wait info into the pg_stat_get_activity(). More
>>> specifically, the following code is broken:
>>>
>>> + proc = BackendPidGetProc(beentry->st_procpid);
>>> + wait_event_type = pgstat_get_wait_event_type(proc->wait_event_info);
>>>
>>> This needs to check if proc is NULL. When reading the code I noticed
>>> that the new functions pg_stat_get_backend_wait_event_type() and
>>> pg_stat_get_backend_wait_event() suffer from the same problem.
>>
>> Good catch.
>
> Agreed.
>
>>> Here is PoC patch which fixes the problem. I am wondering if we should
>>> raise warning in the pg_stat_get_backend_wait_event_type() and
>>> pg_stat_get_backend_wait_event() like the pg_signal_backend() does
>>> when proc is NULL instead of just returning NULL which is what this
>>> patch does though.
>>
>> It still makes the two relevant columns in pg_stat_activity
>> inconsistent each other since it reads the procarray entry twice
>> without a lock on procarray.
>>
>> The attached patch adds pgstat_get_wait_event_info to read
>> wait_event_info in more appropriate way. Then change
>> pg_stat_get_wait_event(_type) to take the wait_event_info.
>>
>> Does this work for you?
>
> This is a hideously way of fixing this problem. The whole point of
> storing the wait event in a 4-byte integer is that we already assume
> reads of 4 byte integers are atomic and thus no lock is needed. The
> only thing we need to do is to prevent the value from being read
> twice, and we already have precedent for how to prevent that in
> freelist.c.

That was intended to say "a hideously expensive way".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-21 18:07:41 Re: max_parallel_degree > 0 for 9.6 beta
Previous Message Robert Haas 2016-04-21 18:04:52 Re: pg_stat_activity crashes