Re: BUG #13490: Segmentation fault on pg_stat_activity

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Michael Bommarito <michael(at)bommaritollc(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13490: Segmentation fault on pg_stat_activity
Date: 2015-07-13 06:54:25
Message-ID: CAB7nPqRurz+i5pUc=AFz7W-QJ-9x66TO8qRNCn5LtoR5voS5vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Jul 13, 2015 at 4:16 AM, Michael Bommarito
<michael(at)bommaritollc(dot)com> wrote:
> This particular instance is from pghero, which is a monitoring tool. It
> can be reproduced simply by querying stat_activity in psql as well. Pghero
> is using prepared statements via ruby from a quick skim on their github
> repo.
>
> We have pg_stat_statements enabled, and can reproduce without pghero setup
> as well. No other extensions loaded.
>
> On Jul 12, 2015 2:37 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Michael Bommarito <michael(at)bommaritollc(dot)com> writes:
>> > Here's the session with debug_query_string:
>> > (gdb) printf "%s\n", debug_query_string
>> > SELECT application_name AS source, client_addr AS ip, COUNT(*) AS
>> > total_connections FROM pg_stat_activity WHERE pid <> pg_backend_pid()
>> > GROUP
>> > BY application_name, ip ORDER BY COUNT(*) DESC, application_name ASC,
>> > client_addr ASC
>>
>> Thanks. This still doesn't match the stack trace: in particular, this
>> stack frame
>>
>> #3 0x00007fd0d478152c in expression_tree_mutator (node=0x7fd0d5d9e908,
>> mutator=0x7fd0d481c390 <replace_rte_variables_mutator>,
>> context=0x7fff52170620) at
>>
>> /tmp/buildd/postgresql-9.5-9.5~alpha1/build/../src/backend/nodes/nodeFuncs.c:2769
>>
>> indicates that we found a PlaceHolderInfo node in the expression tree that
>> pullup_replace_vars() was applied to, but so far as I can see no such node
>> should exist in the query tree generated by this query. The most likely
>> theory seems to be that something clobbered the query tree while it was
>> sitting in the plancache, causing this recursive function to follow a
>> bogus pointer. But that doesn't leave us with a lot to go on.
>>
>> What can you tell us about the environment this is happening in?
>> How is the client-side code executing the failing queries? (We know
>> it's using extended query protocol, but is it preparing a statement
>> and then executing it repeatedly, or just using a one-shot unnamed
>> prepared statement?) What nondefault settings are in use on the
>> server side? Do you have any extensions loaded, such as
>> pg_stat_statements or auto_explain?

FWIW, I have been fooling around with the query reported in the back
trace upthread by playing a bit with the extended query protocol to
send BIND messages with PQdescribePrepared and PQsendDescribePrepared,
as well as with psql and while I am able to reproduce stack traces
close to what you had I am not seeing any crashes. I have as well
played a bit with pghero with pgbench running in parallel and there
were no problems, with and without pg_stat_statements loaded.

In the backtrace you send previously
(http://www.postgresql.org/message-id/CAN=rtBipwKdHCtmXH3r4GNfUhF9e4ZfJbqcj7s_Ec9e2Mbf_LA@mail.gmail.com),
what is the value of MyProcPid? Is it 12803 or 20696? If it is the
former, do you have a backtrace for process 20696? What we may be
looking at now is actually a side effect of the real problem, and as
long as we do not have a real test case, I am afraid that finding the
root problem is rather difficult.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message dmilith 2015-07-13 09:54:27 BUG #13497: Build with dtrace fails
Previous Message Michael Paquier 2015-07-13 04:12:22 Re: [Postgresql Master Branch Patch] object class patch